Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2021 Aug 25;11:17149. doi: 10.1038/s41598-021-96507-0

Machine learning assisted prediction of the Young’s modulus of compositionally complex alloys

Hrishabh Khakurel 1,#, M F N Taufique 2,✉,#, Ankit Roy 3, Ganesh Balasubramanian 3, Gaoyuan Ouyang 4, Jun Cui 4,5, Duane D Johnson 4,5, Ram Devanathan 2
PMCID: PMC8387451  PMID: 34433841

Abstract

We identify compositionally complex alloys (CCAs) that offer exceptional mechanical properties for elevated temperature applications by employing machine learning (ML) in conjunction with rapid synthesis and testing of alloys for validation to accelerate alloy design. The advantages of this approach are scalability, rapidity, and reasonably accurate predictions. ML tools were implemented to predict Young’s modulus of refractory-based CCAs by employing different ML models. Our results, in conjunction with experimental validation, suggest that average valence electron concentration, the difference in atomic radius, a geometrical parameter λ and melting temperature of the alloys are the key features that determine the Young’s modulus of CCAs and refractory-based CCAs. The Gradient Boosting model provided the best predictive capabilities (mean absolute error of 6.15 GPa) among the models studied. Our approach integrates high-quality validation data from experiments, literature data for training machine-learning models, and feature selection based on physical insights. It opens a new avenue to optimize the desired materials property for different engineering applications.

Subject terms: Structural materials, Theory and computation

Introduction

The conventional alloying method almost always starts with one or two principal metallic elements and advances by incorporation of different alloying elements to engineer desired mechanical and chemical properties13. Therefore, the mechanical and chemical properties of the synthesized alloy remain controlled by the principal elements. For instance, Fe is the principal element in steels, Cu/Zn in brass, Ni/Co in superalloys and Ti in titanium alloys46. About 15 years ago, Yeh and Cantor7,8 introduced a novel alloy concept known as high entropy alloys (HEA) that consist of multiple-principal elements (N = 5 or more elements) in near equiatomic percentages. The increased complexity introduces higher configurational entropy (growing as kBT NlnN, where T is the temperature) compared to conventional alloys. As the number of elements N increases, the number of pairs grows as ~ N2 and raises the probability of favorable pair-driven formation enthalpy, which introduces a complex-chemistry effect (often referred to as a “cocktail effect”). The mixing of multi-principal elements generally introduces four core effects, such as, high mixing entropy, lattice distortions, slow diffusion, and a “cocktail” effect, which result in a simple microstructure and excellent mechanical properties913. Further study revealed that several HEAs, such as the Mo0.5AlNbTa0.5TiZr system, did not overcome the enthalpic contributions due to comparatively lower configurational entropies and featured the formation of secondary phases instead of just solid solution phases. Therefore, a more preferred terminology for such alloy systems has emerged, with the more general naming and definition called CCAs14,15 which is the naming convention used throughout this paper.

The number of elemental compositions is much higher in CCAs than that of traditional metallic alloys because CCAs comprise multiple-principal elements16. Moreover, a broader range of compositional space provides an opportunity to improve mechanical properties, such as Young’s modulus, yield strength, and hardness. However, it is extremely challenging to select the appropriate composition by trial-and-error experiment or intuition17. Atomistic modeling, such as molecular dynamics (MD), density functional theory (DFT), and thermodynamic modeling have been devoted to study phase stabilization, solidification, and crystallization kinetics of CCAs1825. These techniques are computationally expensive, challenging to apply to the study of large polycrystalline samples, time consuming, and hence cannot be used on a large scale to narrow down the search space. Moreover, the variety of microstructures gives rise to complex and computationally expensive calculations compared to traditional alloys and hence it is challenging to predict the chemistries and compositions for a target property.

Nowadays, data-driven research and more specifically ML, which is widely used in self driving cars26, image classification27, web-searches28, and fraud detection29, is also employed to solve different challenges in materials science30. For instance, Zhang et al.19 found that atomic size difference (δ), mixing entropy (ΔSmix) and enthalpy (ΔHmix) are the most important features in phase selection of HEAs. Singh et al.3133 used high-throughput DFT to predict properties through the chemical ranges and revealed correlations with valence electron concentration (VEC), size-difference (bandwidth) and vacancies. Roy et al.34 proposed that the average melting temperature (Tm) is the most important feature to predict the Young’s modulus of low, medium and high entropy alloys. Recent efforts utilizing ML35 considered two additional features such as, Pauling electronegativity difference and difference in VEC and used a neural network (NN) to predict the phases that form in these CCAs. Thus, different features control each property of the alloy and the importance of features varies from property to property.

Here, we have employed different tree-based ensemble ML models, linear regression ML models, kernel-based ML models to predict the Young’s modulus of CCAs consisting of refractory elements. This work initially identified VEC, average melting temperature and difference in atomic radii as the most important physical properties that control the Young’s modulus of CCAs. The study compared the relative merits of different ML models for a training set of refractory alloy data that was gathered from published literature. The model prediction was then validated against the Young’s modulus measured for 32 new alloys synthesized and tested as part of this work. The findings offer considerable promise for alloy down selection based on ML models validated against high-quality experimental data of known provenance.

Methodology

Training data collection and feature selection

Data on Young's modulus for CCAs were collected from existing literature34,3638. Two different data sets were used for model training. The first data set contains 154 alloys with a mixture of refractory and non-refractory alloys. The second data set contains 96 refractory alloys of Mo, Nb, Ta, W, mixed with some other elements like Al, Cr and Ni. Both datasets are presented in Tables 1 and 2 in the supplementary section. The goal of using two different data sets (one with a mixture of refractory and non-refractory alloys and the other with only refractory alloys) was to examine the effect of the elemental composition of training data on the reliability of the prediction with respect to experimentally synthesized validation data.

For the features that were used to train the ML models, we calculated 11 feature values of these alloys. These features are listed in Table 1. Past studies have shown that all of these features have a direct effect on the Young’s modulus for any alloy. To obtain these features, we collected data on features identified from domain knowledge, such as Pauling electronegativity, VEC, lattice constant, melting temperature, mixing enthalpy and atomic radii. Then we used Python language scripts to calculate the features mentioned in Table 1.

Table 1.

Features of alloys considered in this analysis.

Feature Description References
Δχ=i=1nCi(xi-x¯)2 Difference in Pauling electronegativity χi weighted by composition Ci for each element i 39
ΔHmix=i=1,ijn4HijCiCj Mixing Enthalpy derived from enthalpies Hij for a pair of elements i and j 40
ΔSmix=-Ri=1n(CilnCi) Mixing entropy; R is the universal gas constant 41
δ=i=1nCi1-rir¯2 Difference in atomic radius ri weighted by composition Ci for each element i 42
Δa=i=1nCi(ai-a¯)2 Difference in lattice constants ai weighted by composition Ci for each element i Analogues to δ
ΔTm=i=1nCi(Ti-T¯)2 Difference in melting temperatures Ti weighted by composition Ci for each element i Analogues to Δχ
λ=ΔSmixδ2 A geometrical parameter 43
Ω=TmΔSmixΔHmix Parameter for predicting solid state formation 42
Tm=i=1nCiTi Average melting temp calculated by rule of mixture 44
am=i=1nCiai Average lattice constant calculated by rule of mixture 44
VEC=i=1nCi(VEC)i Average valence electron concentration calculated by rule of mixture 40

To see the association between the features, we examined the Pearson correlation coefficients (PCC). Figure 1 shows the PCC for the mixed alloys data set and for the refractory alloys data set. In the PCC “heatmap”, P = + 1 indicates a strong positive correlation and P = − 1 indicates a strong negative correlation. Figure 1 indicates the absence of any significant correlation amongst any pair of features except Δa and am from Fig. 1a. However, the ML models we considered here can deal with the multicollinearity, and hence this correlation will not have any significant impact on the predictions. Therefore we considered all the features in the model.

Figure 1.

Figure 1

(a) PCC for data with both refractory and non-refractory alloys, (b) PCC for data with only refractory alloys. A value close to 1 or − 1 indicates positive or negative correlation, respectively.

Validation data preparation and Young’s modulus measurement

An experimental data set was used to validate the final model predictions. The validation set consisted of 32 alloys in the Mo-based family of refractory CCAs, including Mo, Ta W, Ti, Zr, Al, Cr. The validation alloys used in the study were prepared at Ames Lab Materials Preparation Center in the form of thin metal plates/foils. The alloys (1.5 g each) with selected compositions were synthesized by arc melting using a 32-cavity arc melting system (MTI corp, SP-MAM32). The actual compositions of the alloys after arc melting were quantified by energy dispersive spectroscopy (EDS). The densities of the samples were measured by Archimedes measurement. The arc-melted buttons were then sliced by electrical-discharge machining into near-cylinder shapes (two parallel sides) with thicknesses of ~ 3 mm. The elastic modulus values were measured on the cylinders by the ultrasonic pulse-echo technique using a digital ultrasonic thickness gauge (Olympus, 38DL PLUS).

Machine learning models construction

To predict the Young's modulus, four tree based ensemble methods i.e. Gradient Boosting, Ada Boost, Extreme Gradient Boost (or XGBoost), Random Forest (RF), two linear models i.e. LASSO regression, Ridge regression, two kernel based methods i.e. Gaussian Process Regression and Support Vector Machine (SVM) models were used. These models were trained for the two sets of data separately. Once the data was collected and the feature values were selected for both data sets, the 8 ML models were trained on both the data sets. We obtained 16 models, 8 for the larger data set with both the refractory and non-refractory alloys and 8 for the smaller data set with only refractory alloys. Five-fold cross-validation was used to determine the errors. The cross-validation approach is better than the train-test split approach as it gives more robust estimation of the errors. There exist many good metrics to quantify the predictive strength of the model like root-mean-squared (RMS) error, mean-squared error, mean-absolute error (MAE), and the coefficient of determination R2. We chose to use the MAE as our metric as it most closely represents the format of error as reported in most experimental measurements. Additionally, we also reported the R2 values for the optimized models.

The errors were minimized by performing hyper-parameter optimization using the grid-search algorithm. This algorithm works by determining the test error for all possible combinations of the supplied hyper-parameter values. Out of all combinations, the one with the least error was selected for our model. Each of the algorithms has a different set of hyper-parameters. Once the best hyper-parameters were selected, the optimized model using those hyperparameters was used to make predictions for our validation set whose Young’s modulus had been experimentally measured. Finally, the uncertainty of the predictions i.e. standard deviations was calculated by Bootstrapping method by resampling 100 times for each case. All of the above-mentioned tasks like cross-validation and grid search were performed using the scikit-learn45 library in Python. For our study, we employed all the ML models through the scikit-learn machine learning library for the Python language46. The XGBoost model was implemented through the library created by Tianqi Chen47.

Results and discussion

Model optimization

The ML models were first trained on both data sets. The hyper-parameters were optimized and then the training and validation error were calculated using five-fold cross-validation. We used these hyper-parameters to construct our final optimized models. The optimized hyperparameters are presented in the supplementary section (Table 3 in the supplementary section). These hyperparameters were used to predict the Young’s modulus for the unseen data i.e. the experimentally synthesized validation data set. The cross-validated MAE and R2 values for all the models are presented in Table 2. From Table 2 it is clear that the performance of the Gradient Boosting model is superior to other models both in terms of accuracy (i.e., the MAE is lower and R2 is higher than any other models) and robustness (i.e., the standard deviation of cross-validation is lower). Because of this excellent performance, we will discuss the feature importance and prediction of Young’s modulus generated by the Gradient Boosting model.

Table 2.

Optimized hyperparameter and cross-validated MAE and R2 for both data sets.

Model Cross-validated training MAE (GPa) Cross-validated test MAE (GPa) Cross-validated training R2 Cross-validated test R2
Refractory and non-refractory dataset Refractory dataset Refractory and non-refractory dataset Refractory dataset Refractory and non-refractory dataset Refractory dataset Refractory and non-refractory dataset Refractory dataset
Gradient Boosting 0.42 ± 0.26 0.36 ± 0.16 10.37 ± 1.59 6.15 ± 1.19 0.99 ± 0.003 0.99 ± 0.007 0.71 ± 0.080 0.90 ± 0.036
XGBoost 0.33 ± 0.28 1.04 ± 0.48 10.32 ± 1.50 6.68 ± 1.22 0.99 ± 0.003 0.99 ± 0.008 0.70 ± 0.076 0.89 ± 0.038
RF 5.63 ± 0.59 5.54 ± 0.63 13.53 ± 1.50 9.00 ± 1.08 0.95 ± 0.009 0.96 ± 0.010 0.68 ± 0.076 0.89 ± 0.031
Ada Boost 12.79 ± 0.94 5.54 ± 0.84 18.02 ± 1.57 9.31 ± 1.53 0.86 ± 0.021 0.97 ± 0.011 0.62 ± 0.080 0.88 ± 0.051
SVM 14.78 ± 1.61 1.90 ± 0.57 17.83 ± 1.99 6.41 ± 1.39 0.64 ± 0.060 0.97 ± 0.013 0.54 ± 0.074 0.87 ± 0.053
Lasso regression 19.29 ± 1.44 17.53 ± 1.14 21.09 ± 1.64 18.16 ± 1.41 0.60 ± 0.060 0.72 ± 0.049 0.51 ± 0.076 0.67 ± 0.172
Ridge regression 19.37 ± 1.40 33.18 ± 3.32 21.24 ± 1.95 33.34 ± 3.26 0.60 ± 0.057 0.075 ± 0.007 0.51 ± 0.082 0.018 ± 0.065
Gaussian process 33.52 ± 1.90 34.08 ± 3.28 33.81 ± 1.92 34.55 ± 3.32 4.95 E−6 ± 4.9 E−7 1.35 E−5 ± 2.2 E−6 0.04 ± 0.028 0.090 ± 0.067

In our data sets, tree-based ensemble type models perform better than other models to predict Young’s modulus. Ensemble type algorithm showed better performance in other studies to predict materials properties34,48,49. Ensemble methods are meta algorithms that combine several base models to produce a better predictive model. To decrease variance, a bagging ensemble method can be used and to decrease bias a boosting ensemble method can be used. A boosting method converts weak learners to strong ones5052. Usually, decision stumps are used as the base weak learners, but this is not always the case. Most Boosting methods build models in a stage-wise fashion and they generalize the model by optimizing an arbitrary differentiable loss function. Boosting methods also help prevent the problem of over-fitting to some extent. Additionally, Boosting methods solve the problems of a non-linear relation between target properties and features and help to deal with the collinearity among the features. Furthermore, most boosting methods provide the feature importance associated with the model. Feature importance is important to conclude which features influence Young’s modulus the most. Boosting methods are affected by the presence of outliers. Hence, it is recommended to perform outlier analysis before training the data.

Feature importance

After training the models on both data sets containing refractory and non-refractory alloys using the optimized hyper-parameters, we determined the feature importance associated with the Gradient Boosting model. Feature importance is simply the score assigned to the features based on how useful they are at predicting a target variable. The feature importance for the larger data set containing both the refractory and non-refractory alloys, and smaller data set only with refractory alloys are presented in Fig. 2a,b, respectively. From feature importance, it is clear that the sequence of the features is not identical for both data sets. However, the smaller training data set showed better prediction accuracy as indicated in Table 2. Hence, we selected the important features generated from the smaller data set presented in Fig. 2b. In the next paragraph, we are going to explain the physical significance of some of the important features for the Young’s modulus of CCAs.

Figure 2.

Figure 2

Feature importance for (a) larger training set containing both refractory and non-refractory alloys, (b) smaller training set containing only refractory alloys.

We found that VEC was the most important feature and had importance higher than 0.7. While it is not shown here, it is important to mention that other ML models i.e. XGBoost and RF showed good prediction capabilities and identified the VEC as the most important feature with an importance of more than 0.7. In the elastic limit and at a constant value of Poisson’s ratio, the Young’s modulus is related to the bulk modulus (Eq. 1) and hence we will explain the physics of the Young’s modulus dependence on VEC by exploring the physical relationship between bulk modulus and VEC53,54.

K=E3(1-2v) 1

Here, K, E and v are the bulk modulus, Young’s modulus and Poisson’s ratio, respectively. Gilman et al.53,54, reported that materials with higher valence electron density (VED) (valence electrons/unit volume) possess higher bulk modulus. As the number of valence electrons increases, the bulk modulus increases, and it decreases as the atomic size increases. The bulk modulus is determined predominantly by the resistance of the valence electrons to compression. In a metallic system, electrons behave like a dense gas, or liquid, with only a very small amount of viscosity. Hence, the greater the electron density, the more the resistance to compression, and the higher the bulk modulus and the Young’s modulus. For instance, osmium, possesses a VED 17% higher than for diamond and correspondingly exhibits a bulk modulus 4% greater as well53,54. Though we considered VEC instead of VED in this work, it still follows the upward trend of Young’s modulus both for training and validation data sets with VEC as presented in Fig. 3a,b. Our calculated feature importance indicates that the melting point of alloys, which is an indirect metric of bond strength34,55, has an impact on Young’s modulus, which generally increases with increasing melting temperature as presented in Fig. 3c,d. The geometrical parameter λ, which is a function of mixing entropy (ΔSmix) and the difference in atomic radii (δ) has a significant impact on Young’s modulus. The δ parameter has an impact on cohesive energy and Young’s modulus increases with increasing cohesive energy56,57. In our case, we have seen that a lower value of δ results in higher Young’s modulus as presented in Fig. 3e,f. The difference in atomic radius influences the distribution of alloying elements and metallic bond energy. The electronegativity has an impact on the electron density of atoms and the larger value of electronegativity result in a higher Young’s modulus of metallic alloys58. Additionally, larger electronegativity differences (Δχ) and higher mixing enthalpy (ΔHmix) increases the probability of formation of intermetallic brittle phases, which have lower Young’s modulus. Therefore, these two parameters could play an important role to determine Young’s modulus of CCAs34.

Figure 3.

Figure 3

Impact of of some prominent features on Young’s modulus. (a) Relation between Young’s modulus and VEC for training set and (b) for experimental validation set. (c) Relation between melting temperature and Young’s modulus for training set and (d) for experimental validation set. (e) Relation between the difference of atomic radii and Young’s modulus for training set and (f) for experimental validation set.

It is important to mention that Roy et al.34 predicted Young’s modulus of low, medium and high entropy alloys composed of 5 elements by employing Gradient Boosting method and found that average melting temperature (Tm) was the most important feature without considering the impact of VEC. Corresponding MAE for their study was 23.59 GPa. In this study, we achieved significantly better performance (MAE = 6.15 GPa) by considering VEC in the feature sets. From the above discussion, we propose that VEC is the most important feature that determines the Young’s modulus of this refractory alloy system. Therefore, it is essential to include VEC as a key parameter in the design of new CCAs with tailored Young’s modulus.

Experimental validation

We finally used the trained Gradient Boosting model to predict Young's modulus of unseen CCAs, which are the experimentally synthesized 32 CCAs mostly composed of Mo–Ta–Ti–W–Zr elements. As the experimental validation alloys are all refractory alloys, we examined how the types of training sets have impact on the prediction of Young’s modulus. When we trained the Gradient Boosting model with larger data set containing both refractory and non-refractory alloys the predictions of the Young’s modulus were significantly off compared to experimentally measured Young’s modulus as presented in Fig. 4a. The predicted value consistently underestimated the experimental value. In contrast, we have achieved excellent predictions when we consider only the refractory alloys to train the Gradient Boosting model as presented in Fig. 4b. Only 2 predictions (alloy numbers 6 and 8) out of 32 alloys are outside of 68.3% confidence interval (± σ, where σ is the standard deviation of each prediction. Table 3 presents the actual value of experimental Young’s modulus, mean prediction of Young’s modulus with the percentage of error and standard deviation when the model was trained with refractory alloys. 26 of the alloys had errors ≤ 5% and a few of the predictions are almost identical compared to experimental values.

Figure 4.

Figure 4

Young's Modulus Prediction by Gradient Boosting model when trained (a) with data containing both refractory and non-refractory alloys and (b) with only refractory alloys.

Table 3.

Predicted Young's modulus with percentage of error and standard deviation from Gradient Boosting model trained with data containing refractory alloys.

Alloy number Alloy composition (actual at. % compositions by EDS) Experimental Young's modulus (GPa) Mean prediction % Error Standard deviation (± σ)
1 Mo85.25Ta9.52Ti2.29Zr2.94 257.6 248.2 3.7 18.9
2 Mo82.23W1.29Ta9.46Ti3.27Zr3.36Al0.39 260 246.5 5.2 17.6
3 Mo82.93W2Ta9.89Ti2.4Zr2.72Al0.05 256.3 247.4 3.5 18.1
4 Mo80.67W3.3Ta10.34Ti2.45Zr3.13Al0.05Cr0.06 264.7 253.9 4.1 18.6
5 Mo76.41W7.23Ta10.69Ti2.33Zr3.17Al0.16 268.9 255.5 5.0 17.6
6 Mo78.92W4.27Ta10.72Ti2.7Al3.39 273.7 244.1 10.8 17.1
7 Mo84.31W2.48Ta5.84Ti2.64Zr2.95Al1.79 261.2 245.1 6.2 17.3
8 Mo85.25W3.05Ta5.51Ti2.28Zr3.39Al0.23Cr0.29 272.2 248.8 8.6 18.5
9 Mo79.73W0.09Ta12.36Ti3.92Zr3.88Cr0.03 243.9 245.5 − 0.7 17.4
10 Mo78.53W1.06Ta12.53Ti3.68Zr4.18Cr0.03 237.6 245.3 − 3.2 17.3
11 Mo78.58W2.14Ta11.19Ti3.79Zr4.3 240 246.1 − 2.6 17.5
12 Mo75.86W3.13Ta12.65Ti3.89Zr4.47 250.3 245.6 1.9 17.4
13 Mo75.66W3.69Ta12.2Ti3.8Zr4.65 238.4 245.6 − 3.0 17.4
14 Mo73.77W7.67Ta10.17Ti3.7Zr4.69 265.8 252.7 4.9 17.4
15 Mo81.5W1.63Ta6.37Ti3.9Zr4.51Al1.96Cr0.13 241.1 244.6 − 1.4 17.8
16 Mo78.86W2.93Ta7.48Ti3.69Zr5.36Cr1.68 257.4 248.0 3.7 17.5
17 Mo79.92Ta9.87Ti4.69Zr5.45Cr0.07 237.1 245.3 − 3.5 17.9
18 Mo76.31W0.41Ta9.3Ti6.22Zr7.29Al0.37Cr0.08 246.3 242.8 1.4 18.4
19 Mo80.87W1.02Ta6.98Ti5.23Zr5.88Al0.03 249.7 246.4 1.3 18.0
20 Mo76.47W3.17Ta8.64Ti5.25Zr6.45Cr0.02 247.3 246.0 0.5 17.5
21 Mo73.61W5.27Ta10.49Ti4.71Zr5.93 240.1 246.2 − 2.5 17.5
22 Mo71.98W6.62Ta9.97Ti5.06Zr6.32Cr0.06 241 246.2 − 2.2 17.3
23 Mo80.03W1.49Ta4.47Ti5.24Zr6.01Al2.73Cr0.04 240.4 236.9 1.4 19.8
24 Mo78.09W3.06Ta4.93Ti4.92Zr7.9Cr1.1 243.6 245.7 − 0.9 18.7
25 Mo81.65W0.17Ta18.12Ti0.05 260.1 255.1 1.9 16.3
26 Mo78.35W1.61Ta20.03 266.1 254.8 4.2 15.9
27 Mo76.96W2.93Ta20Ti0.1 267.1 255.3 4.4 15.7
28 Mo75.99W3.83Ta20.18 270.8 256.1 5.4 16.1
29 Mo76.32W3.14Ta20.48Ti0.05Cr0.01 255.8 255.3 0.2 15.7
30 Mo74.54W4.2Ta21.25 270.4 257.0 5.0 15.3
31 Mo80.97W3.88Ta14.61Zr0.04Al0.49 265.3 254.8 4.0 16.3
32 Mo77.21W4.17Ta17.69Ti0.34Zr0.07Al0.1Cr0.41 272.8 255.3 6.4 15.7

From Fig. 4 and Table 3 we conclude that the quality of the training data is very important to predict the target property accurately. We have a larger training set (154 alloys) with refractory and non-refractory alloys. On the other hand, we have a smaller training set (96 alloys) only with refractory alloys. Since the training set was more homogeneous for the smaller data set, we achieved better predictions. Moreover, the predicted Young’s modulus followed the trend with the experimental Young’s modulus with some exceptions as presented in Fig. 4b. Therefore, it is not only the size of the training data but also the quality and relevance of the training data that are important for better predictions.

Conclusion

We have presented an approach that uses ML with high throughput experimental synthesis and mechanical testing of alloys to predict the Young’s modulus of CCAs reliably. We conclude that among the eight ML models we used, Gradient Boosting had the best predictive strength. The prediction of Young’s modulus was influenced by the model chosen and by the composition of training data. Our experimental validation set was composed of refractory alloys, and when the models were trained with data containing only refractory alloys, the predictions were closer to the experimental values. This shows that when training ML models to predict characteristics of alloys, it is advantageous to include alloys of similar composition in the training data set. The valence electron concentration is the most important feature governing the Young’s modulus of refractory CCAs and can be used to rapidly screen alloys. Since feature importance also appears to be influenced by the choice of training data set, it is important to choose carefully the training data set based on the type of alloy being studied and validate against high-quality experimental data of known provenance. The integration of experimental synthesis and testing, machine learning, and physics-based interpretation demonstrated in this work holds considerable promise for alloy design and property prediction.

Supplementary Information

Supplementary Tables. (30.8KB, docx)

Acknowledgements

This effort was principally supported by the U.S. Department of Energy's (DOE) Office of Energy Efficiency and Renewable Energy (EERE) under the Advanced Manufacturing Office (Project WBS 2.1.0.19) through Ames Laboratory, which is operated for the U.S. DOE by Iowa State University under contract DE-AC02-07CH11358. HK was supported in part through the National Science Foundation (NSF) Mathematical Sciences Graduate Internship (MSGI) Program sponsored by the NSF Division of Mathematical Sciences. This program is administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and NSF. ORISE is managed for DOE by ORAU. This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of its employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Author contributions

H.K. and M.F.N.T. contributed equally to this work as first authors. They performed the dataset construction, data analysis and wrote the manuscript. G.O. synthesized and characterized the validation data set. A.R, G.B., G.O., J.C., and D.J. oversaw results and discussion and reviewed the manuscript. R.D. provided technical expertise to CCA data, extracted data from research articles, oversaw the results and reviewed the manuscript.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Code availability

The codes that support the findings of this study are available from the corresponding author upon reasonable request.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Hrishabh Khakurel and M. F. N. Taufique.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-021-96507-0.

References

  • 1.Huang SC, et al. Mechanical properties of zirconium-based random alloys: Alloying elements and composition dependencies. Comput. Mater. Sci. 2017;127:60–66. doi: 10.1016/j.commatsci.2016.10.028. [DOI] [Google Scholar]
  • 2.Inoue A, et al. Marzouki, development and applications of highly functional Al-based materials by use of metastable phases. Mater. Res. 2015;18:1414–1425. doi: 10.1590/1516-1439.058815. [DOI] [Google Scholar]
  • 3.Abdelaziz MH, Paradis M, Samuel AM, Doty HW, Samuel FH. Effect of aluminum addition on the microstructure, tensile properties, and fractography of cast Mg-based alloys. Ann. Mater. Sci. Eng. 2017;2:1–10. [Google Scholar]
  • 4.Schinhammer M, Hänzi AC, Löffler JF, Uggowitzer PJ. Design strategy for biodegradable Fe-based alloys for medical applications. Acta Biomater. 2010;6:1705–1713. doi: 10.1016/j.actbio.2009.07.039. [DOI] [PubMed] [Google Scholar]
  • 5.Long H, Mao S, Liu Y, Zhang Z, Han X. Microstructural and compositional design of Ni-based single crystalline superalloys—A review. J. Alloy. Compd. 2018;743:203–220. doi: 10.1016/j.jallcom.2018.01.224. [DOI] [Google Scholar]
  • 6.Hayama AOF, et al. Effects of composition and heat treatment on the mechanical behavior of Ti–Cu alloys. Mater. Des. 2014;55:1006–1013. doi: 10.1016/j.matdes.2013.10.050. [DOI] [Google Scholar]
  • 7.Yeh JW, et al. Nanostructured highentropy alloys with multiple principal elements: Novel alloy design concepts and outcomes. Adv. Eng. Mater. 2004;6:299–303. doi: 10.1002/adem.200300567. [DOI] [Google Scholar]
  • 8.Cantor B, Chang ITH, Knight P, Vincent AJB. Microstructural development in equiatomic multicomponent alloys. Mater. Sci. Eng. A. 2004;375–377:213–218. doi: 10.1016/j.msea.2003.10.257. [DOI] [Google Scholar]
  • 9.Yim D, Kim HS. Fabrication of the high-entropy alloys and recent research trends: A review. Korean J. Met. Mater. 2017;55:671–683. [Google Scholar]
  • 10.Ren B, et al. Corrosion behavior of CuCrFeNiMn high entropy alloy system in 1 M sulfuric acid solution. Mater. Corros. 2012;63:828–834. doi: 10.1002/maco.201106072. [DOI] [Google Scholar]
  • 11.Kang YB, Shim SH, Lee KH, Hong SI. Dislocation creep behavior of CoCrFeMnNi high entropy alloy at intermediate temperatures. Mater. Res. Lett. 2018;6:689–695. doi: 10.1080/21663831.2018.1543731. [DOI] [Google Scholar]
  • 12.Fu ZQ, MacDonald BE, Monson TC. Influence of heat treatment on microstructure, mechanical behavior, and soft magnetic properties in an fcc-based Fe29Co28Ni29Cu7Ti7 high-entropy alloy. J. Mater. Res. 2018;33:2214–2222. doi: 10.1557/jmr.2018.161. [DOI] [Google Scholar]
  • 13.Tikhonovsky MA, Salishchev GA, Yurchenko NY, Stepanov ND, Zherebtsov SV. Aging behavior of the HfNbTaTiZr high entropy alloy. Mater. Lett. 2018;211:87–90. doi: 10.1016/j.matlet.2017.09.094. [DOI] [Google Scholar]
  • 14.Qiu Y, et al. A lightweight single-phase AlTiVCr compositionally complex alloy. Acta Mater. 2017;123:115–124. doi: 10.1016/j.actamat.2016.10.037. [DOI] [Google Scholar]
  • 15.Jensen JK, et al. Characterization of the microstructure of the compositionally complex alloy Al1Mo0.5Nb1Ta0.5Ti1Zr1. Scr. Mater. 2016;121:1–4. doi: 10.1016/j.scriptamat.2016.04.017. [DOI] [Google Scholar]
  • 16.Ye YF, Wang Q, Lu J, Liu CT, Yang Y. High-entropy alloy: Challenges and prospects. Mater. Today. 2016;19:349–362. doi: 10.1016/j.mattod.2015.11.026. [DOI] [Google Scholar]
  • 17.Miracle DB, Senkov ON. A critical review of high entropy alloys and related concepts. Acta Mater. 2017;122:448–511. doi: 10.1016/j.actamat.2016.08.081. [DOI] [Google Scholar]
  • 18.Ma D, Grabowski B, Körmann F, Neugebauer J, Raabe D. Ab initio, thermodynamics of the CoCrFeMnNi high entropy alloy: Importance of entropy contributions beyond the configurational one. Acta Mater. 2015;100:90–97. doi: 10.1016/j.actamat.2015.08.050. [DOI] [Google Scholar]
  • 19.Zhang C, Zhang F, Chen S, Cao W. Computational thermodynamics aided high-entropy alloy design. J. Occup. Med. 2012;64:839–845. [Google Scholar]
  • 20.Jiang C, Uberuaga BP. Efficient ab initio modeling of random multicomponent alloys. Phys. Rev. Lett. 2016;116:105501. doi: 10.1103/PhysRevLett.116.105501. [DOI] [PubMed] [Google Scholar]
  • 21.Saal JE, Berglund IS, Sebastian JT, Liaw PK, Olson GB. Equilibrium high entropy alloy phase stability from experiments and thermodynamic modeling. Scr. Mater. 2017;146:5–8. doi: 10.1016/j.scriptamat.2017.10.027. [DOI] [Google Scholar]
  • 22.Lederer Y, Toher C, Vecchio KS, Curtarolo S. The search for high entropy alloys: A high-throughput ab-initio approach. Acta Mater. 2018;159:364–383. doi: 10.1016/j.actamat.2018.07.042. [DOI] [Google Scholar]
  • 23.Sanchez JM, Vicario I, Albizuri J, Guraya T, Garcia JC. Phase prediction, microstructure and highhardness of novel light-weight high entropy alloys. J. Mater. Res. Technol. 2018;424:1–9. [Google Scholar]
  • 24.Tapia AJSF, Yim D, Kim HS, Lee BJ. An approach for screening single phase high-entropy alloys using an inhouse thermodynamic database. Intermetallics. 2018;101:56–63. doi: 10.1016/j.intermet.2018.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Senkov ON, Miller JD, Miracle DB, Woodward C. Accelerated exploration of multiprincipal element alloys with solid solution phases. Nat. Commun. 2015;6:6529. doi: 10.1038/ncomms7529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bojarski, M. et al. End to end learning for self-driving cars. Preprint at arXiv:1604.07316 (2016).
  • 27.He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing humanlevel performance on ImageNet classification. In: Bajcsy R, Hager G, editors. 2015 IEEE International Conference on Computer Vision (ICCV) IEEE; 2015. pp. 1026–1034. [Google Scholar]
  • 28.Pazzani M, Billsus D. Learning and revising user profiles: The identification of interesting web sites. Mach. Learn. 1997;27:313–331. doi: 10.1023/A:1007369909943. [DOI] [Google Scholar]
  • 29.Chan PK, Stolfo SJ. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In: Agrawal R, Stolorz P, Piatetsky G, editors. KDD’98 Proc. Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press; 1998. pp. 164–168. [Google Scholar]
  • 30.Rickman JM, Balasubramanian G, Marvel CJ, Chan HM, Burton M-T. Machine learning strategies for high-entropy alloys. J. Appl. Phys. 2020;128:221101. doi: 10.1063/5.0030367. [DOI] [Google Scholar]
  • 31.Singh P, Sharma A, Smirnov AV, Diallo MS, Ray P, Balasubramanian G, Johnson DD. Design of high-strength refractory complex solid-solution alloys. npj Comput. Mater. 2018;4:16. doi: 10.1038/s41524-018-0072-0. [DOI] [Google Scholar]
  • 32.Singh P, Smirnov AV, Alam A, Johnson DD. First-principles prediction of incipient order in arbitrary high-entropy alloys: Exemplified in Ti0.25CrFeNiAlx. Acta Mater. 2020;189:248–254. doi: 10.1016/j.actamat.2020.02.063. [DOI] [Google Scholar]
  • 33.Singh P, et al. Vacancy-mediated complex phase selection in high entropy alloys. Acta Mater. 2020;194:540–546. doi: 10.1016/j.actamat.2020.04.063. [DOI] [Google Scholar]
  • 34.Roy A, Babuska T, Krick B, Balasubramanian G. Machine learned feature identification for predicting phase and Young’s modulus of low-, medium- and high-entropy alloys. Scr. Mater. 2020;185:152–158. doi: 10.1016/j.scriptamat.2020.04.016. [DOI] [Google Scholar]
  • 35.Islam N, Huang W, Zhuang HL. Machine learning for phase selection in multi-principal element alloys. Comput. Mater. Sci. 2018;150:230–235. doi: 10.1016/j.commatsci.2018.04.003. [DOI] [Google Scholar]
  • 36.Senkov O, Miracle D, Chaput K, Couzinie J. Development and exploration of refractory high entropy alloys—A review. J. Mater. Res. 2018;33:3092–3128. doi: 10.1557/jmr.2018.153. [DOI] [Google Scholar]
  • 37.Li W, Liu P, Liaw PK. Microstructures and properties of high-entropy alloy films and coatings: A review. Mater. Res. Lett. 2018;6(4):199–229. doi: 10.1080/21663831.2018.1434248. [DOI] [Google Scholar]
  • 38.Couzinié J-P, Senkov ON, Miracle DB, Dirras G. Comprehensive data compilation on the mechanical properties of refractory high-entropy alloys. Data Brief. 2018;21:1622–1641. doi: 10.1016/j.dib.2018.10.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Fang S, Xiao X, Xia L, Li W, Dong Y. Relationship between the widths of supercooled liquid regions and bond parameters of Mg-based bulk metallic glasses. J. Non-Cryst. Solids. 2003;321:120–125. doi: 10.1016/S0022-3093(03)00155-8. [DOI] [Google Scholar]
  • 40.Guo S, Ng C, Lu J, Liu CT. Effect of valence electron concentration on stability of fcc or bcc phase in high entropy alloys. J. Appl. Phys. 2011;109:103505. doi: 10.1063/1.3587228. [DOI] [Google Scholar]
  • 41.Takeuchi A, Inoue A. Classification of bulk metallic glasses by atomic size difference, heat of mixing and period of constituent elements and its application to characterization of the main alloying element. Mater. Trans. 2005;46:2817–2829. doi: 10.2320/matertrans.46.2817. [DOI] [Google Scholar]
  • 42.Yang X, Zhang Y. Prediction of high-entropy stabilized solid-solution in multi-component alloys. Mater. Chem. Phys. 2012;132:233–238. doi: 10.1016/j.matchemphys.2011.11.021. [DOI] [Google Scholar]
  • 43.Singh AK, Kumar N, Dwivedi A, Subramaniam A. A geometrical parameter for the formation of disordered solid solutions in multi-component alloys. Intermetallics. 2014;53:112–119. doi: 10.1016/j.intermet.2014.04.019. [DOI] [Google Scholar]
  • 44.Senkov ON, Wilks GB, Miracle DB, Chuang CP, Liaw PK. Refractory high-entropy alloys. Intermetallics. 2010;18:1758–1765. doi: 10.1016/j.intermet.2010.05.014. [DOI] [Google Scholar]
  • 45.Breiman, L. Arcing The Edge. Technical Report 486. Statistics Department, University of California, Berkeley (1997).
  • 46.Pedregosa F, et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
  • 47.Tianqi, C. & Carlos, G. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794 (2016).
  • 48.Mamun O, Wenzlick M, Hawk J, et al. A machine learning aided interpretable model for rupture strength prediction in Fe-based martensitic and austenitic alloys. Sci. Rep. 2021;11:5466. doi: 10.1038/s41598-021-83694-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Mamun O, Wenzlick M, Sathanur A, et al. Machine learning augmented predictive and generative model for rupture life in ferritic and austenitic steels. npj Mater. Degrad. 2021;5:20. doi: 10.1038/s41529-021-00166-5. [DOI] [Google Scholar]
  • 50.Schapire RE. The strength of weak learnability. Mach. Learn. 1990;5:197–227. [Google Scholar]
  • 51.Friedman JH. Greedy function approximation: A gradient boosting machine (PDF) Ann. Stat. 2001;29:1189–1232. doi: 10.1214/aos/1013203451. [DOI] [Google Scholar]
  • 52.Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997;55:119–139. doi: 10.1006/jcss.1997.1504. [DOI] [Google Scholar]
  • 53.Gilman JJ. Electronic Basis of the Strength of Materials, Chapter 12. Cambridge University Press; 2003. [Google Scholar]
  • 54.Gilman JJ, Cumberland RW, Kaner RB. Design of hard crystals. Int. J. Refract. Met. Hard Mater. 2006;24:1–5. doi: 10.1016/j.ijrmhm.2005.05.015. [DOI] [Google Scholar]
  • 55.Rickman JM. Data analytics and parallel-coordinate materials property charts. npj Comput. Mater. 2018;4:5. doi: 10.1038/s41524-017-0061-8. [DOI] [Google Scholar]
  • 56.Roy A, Sreeramagiri P, Babuska T, Krick B, Ray PK, Balasubramanian G. Lattice distortion as an estimator of solid solution strengthening in high-entropy alloys. Mater. Charact. 2021;172:110877. doi: 10.1016/j.matchar.2021.110877. [DOI] [Google Scholar]
  • 57.Pettifor DG. Electron theory of metals. In: Cahn RW, Haasen P, editors. Physical Metallurgy. Elsevier; 1983. [Google Scholar]
  • 58.Li K, Kang C, Xue D. Electronegativity calculation of bulk modulus and band gap of ternary ZnO-based alloys. Mater. Res. Bull. 2012;47:2902–2905. doi: 10.1016/j.materresbull.2012.04.115. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Tables. (30.8KB, docx)

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

The codes that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES