Skip to main content
ACS Omega logoLink to ACS Omega
. 2022 Jan 5;7(2):2429–2437. doi: 10.1021/acsomega.1c06481

Development of Prediction Models for the Self-Accelerating Decomposition Temperature of Organic Peroxides

Toshiharu Morishita 1, Hiromasa Kaneko 1,*
PMCID: PMC8771957  PMID: 35071930

Abstract

graphic file with name ao1c06481_0011.jpg

Thermal risk assessment is very important in the primary stages of chemical compound development. In this study, a model to estimate the self-accelerated decomposition temperature of organic peroxides was developed. The structural information of compounds was used to calculate descriptors, on which partial least-squares (PLS) regression and support vector regression were applied for temperature prediction. Molecular mechanics and density functional theory calculations were performed before descriptor calculations, for structure optimization, using a genetic algorithm for variable selection. Structure optimization and variable selection immensely improved the prediction accuracy. Thus, a PLS model, with R2 = 0.95, root mean square error = 5.1 °C, and mean absolute error = 4.0 °C, exhibiting higher accuracy than existing self-accelerating decomposition temperature prediction models, was constructed.

Introduction

Thermal risk assessment is extremely crucial in the development of chemical compounds. Self-accelerating decomposition temperature (SADT) is a key parameter characterizing the thermal risk of organic peroxides. It is the lowest temperature for self-accelerating decomposition in organic peroxides and self-reactive substances (used in transportation packaging). Thus, it determines the optimum temperature-control to avoid thermal hazards during material storage and transport.1 Several experimental methods measure SADT for thermal risk assessment; however, the associated cost, risk, and chemicals make early-stage evaluation very difficult. Therefore, it is beneficial to develop a simple and high-accuracy SADT prediction method.

Previous studies have proposed quantitative structure–property relationship models to predict SADT. Wang et al.2 performed density functional theory (DFT) calculations [6-31G(d)/B3LYP] in Gaussian 09 to obtain descriptors. Geometrical descriptors (bond length, bond angle, and dipole moment) and quantum chemical descriptors [highest occupied molecular orbital (HOMO)/ lowest unoccupied MO (LUMO), bond dissociation energy] were used to construct prediction models with multiple linear regression (MLR) and support vector regression (SVR) to estimate SADT. He et al.3 used the semiempirical molecular orbital technique (AM1) for preprocessing before geometry optimization and frequency calculations. Descriptors (excluding quantum chemical descriptors) were calculated in DRAGON 6.0, and the genetic algorithm (GA) was applied for variable selection, followed by MLR and SVR construction, to estimate SADT. The first and second methods suffered from the limitations of high computational load and low accuracy, respectively.

In this study, a model with high accuracy and low calculation load was developed to estimate the SADT of organic peroxides. Descriptors were calculated using the optimal molecular conformation, determined by molecular mechanics (MM) and DFT calculations. GA was used for variable selection, followed by the application of partial least-squares (PLS) regression, as a linear regression method, and SVR, as a nonlinear regression method, to predict SADT. Prediction models, including and excluding structural optimization and variable selection, were developed and analyzed.

Methods

Data Set

The data set included 65 organic compounds, with 90.14–571.00 molecular weights, and −5.0 to 196.5 °C SADTs, obtained from the literature,24 determined using different calorimetric methods (TG-DSC and C80). However, as previously reported, SADT is independent of the determination method used.4 Compounds included commonly used organic peroxides, such as dialkyl peroxide, diacyl peroxide, hydroperoxide, peroxyester, ketone peroxide, peroxy carbonate, and diperoxide. Organic peroxides used here, and their experimental SADTs, are listed in Table 1. The data set was divided into two subsets: a training set (52 samples) and a test set (13 samples), following the method reported in a previous publication.3

Table 1. Compounds and Their Experimental SADTs.

no. compound name CAS no. MW SADT [°C]
1 tert-butyl hydroperoxide 75-91-2 90.14 120.4
2 cumyl hydroperoxide 80-15-9 152.21 79.0
3 dicumyl peroxide 80-43-3 270.40 77.8
4 p-menthane hydroperoxide 80-47-7 172.30 73.5
5 dibenzoyl peroxide 94-36-0 242.24 80.0
6 diisopropyl peroxydicarbonate 105-64-6 206.22 5.0
7 tert-butyl peroxyacetate 107-71-1 132.18 65.0
8 tert-butyl peroxyisobutyrate 109-13-7 160.24 30.0
9 di-tertbutyl peroxide 110-05-4 146.26 80.9
10 diacetyl peroxide 110-22-5 118.10 35.0
11 disuccinic acid peroxide 123-23-9 234.18 25.0
12 bis(2,4-dichlorobenzoyl)peroxide 133-14-2 380.00 60.0
13 tert-butyl peroxybenzoate 614-45-9 194.25 65.8
14 bis(1-oxononyl)peroxide 762-13-0 314.52 20.0
15 butyl 4,4-bis(tert-butylperoxy)pentanoate 995-33-5 334.51 55.0
16 2,5-bis-(t-butylperoxy)-2,5-dimethyl-3-hexyne 1068-27-5 286.46 84.8
17 methyl ethyl ketone peroxide 1338-23-4 210.26 60.0
18 dicyclohexyl peroxydicarbonate 1561-49-5 286.36 25.0
19 2,3-dimethyl-2,3-diphenylbutane 1889-67-4 238.40 196.5
20 tert-butyl peroxy isopropylcarbonate 2372-21-6 176.24 62.2
21 tert-butyl peroxy diethylacetate 2550-33-6 188.30 35.0
22 2,5-dimethyl-2,5-di-(benzoylperoxy)hexane 2618-77-1 386.48 69.0
23 tert-butyl peroxy-2-ethylhexanoate 3006-82-4 216.36 35.0
24 1,1-bis-(tertbutylperoxy)cyclohexane 3006-86-8 260.42 60.0
25 2,5-dimethyl-2,5-bis(hydroperoxy)hexane 3025-88-5 178.26 105.0
26 bis(2-chlorobenzoyl)peroxide 3033-73-6 311.12 51.3
27 dipropionyl peroxide 3248-28-0 146.16 30.0
28 diisobutyryl peroxide 3437-84-1 174.22 0.0
29 tert-butyl cumyl peroxide 3457-61-2 208.33 77.1
30 1,1-bis(tert-butylperoxy)-3,3,5-trimethylcyclohexane 6731-36-8 302.51 60.0
31 3,4-dimethyl-3,4-diphenylhexane 10192-93-5 266.46 158.6
32 cyclohexanone peroxide 12262-58-7 246.34 80.0
33 tert-butyl 3,5,5-trimethylperoxyhexanoate 13122-18-4 230.39 24.0
34 bis(4-tert-butylcyclohexyl)peroxydicarbonate 15520-11-3 398.60 40.0
35 di(n-propyl)peroxydicarbonate 16066-38-9 206.22 –5.0
36 di-n-butyl peroxydicarbonate 16215-49-9 234.28 5.0
37 di-sec-butyl peroxydicarbonate 19910-65-7 234.28 0.0
38 α,α-dimethylbenzyl peroxypivalate 23383-59-7 236.34 15.0
39 bis(tert-butyl peroxyisopropyl)benzene 25155-25-3 338.54 80.8
40 dihexadecyl peroxodicarbonate 26322-14-5 571.00 37.5
41 cumyl peroxyneodecanoate 26748-47-0 306.49 7.8
42 di-isopropylbenzene hydroperoxide 26762-93-6 196.32 65.0
43 acetylacetone peroxide 37187-22-7 230.24 64.7
44 tert-butyl peroxy-2-ethylhexyl carbonate 34443-12-4 246.39 51.0
45 2,4,4-trimethylpentyl-2-peroxyneodecanoate 51240-95-0 314.52 18.1
46 bis(3-methoxybutyl)peroxydicarbonate 52238-68-3 294.34 15.0
47 di(2-ethoxyethyl)peroxydicarbonate 52373-74-7 266.28 10.0
48 diacetone alcohol peroxide 54693-46-8 230.34 50.0
49 tert-amyl peroxyneodecanoate 68299-16-1 258.45 10.8
50 tert-amyl peroxy 2-ethylhexyl carbonate 70833-40-8 260.42 55.0
51 t-hexyl peroxide benzoate 124350-67-0 222.31 62.2
52 cumyl peroxyneoheptanoate 130097-36-8 278.38 10.0
53 dioctanoyl peroxide 762-16-3 286.46 25.9
54 bis(3,5,5-trimethylhexanoyl)peroxide 3851-87-4 314.52 20.0
55 tert-butyl peroxyneodecanoate 26748-41-4 258.40 15.0
56 tert-amyl peroxypivalate 29240-17-3 188.30 21.6
57 bis-(2-ethylhexyl)peroxydicarbonate 16111-62-9 346.52 15.4
58 dimyristyl peroxydicarbonate 53220-22-7 514.88 19.2
59 tert-butyl peroxypivalate 927-07-1 174.27 27.0
60 di-lauroyl peroxide 105-74-8 398.70 46.0
61 di-decanoyl peroxide 762-12-9 342.58 31.0
62 2,5-bis-(2-ethylhexanoylperoxy)-2,5-dimethylhexane 13052-09-0 430.70 38.6
63 tert-amyl peroxy-2-ethylhexanoate 686-31-7 230.39 35.0
64 2,2-bis(tert-butylperoxy)butane 2167-23-9 234.38 70.0
65 di-tert-amyl peroxide 10508-09-5 174.32 70.4

Geometry Optimization

Molecular structures of the 65 compounds were prepared in the molfile format. Some descriptors were molecular-structure-dependent, making geometry optimization very important. Auto geometry optimization on Avogadro,5 a molecular modeling software for quantum chemical calculations, was performed for each mol file. The basic MM potential energy function includes bonded terms (for covalently bonded atomic interactions) and nonbonded terms (for long-range electrostatic and van der Waals forces). Here, UFF (universal force field) was used to improve bond lengths and bond angles to obtain a minimum-energy conformation. Prior MM calculations improved DFT convergence, with less computational load. DFT calculations using Firefly (PC GAMESS),6 via MoCalc2012,7 optimized the molecular structures. “Geometry optimization” job type, using hybrid density functional B3LYP with 6-31G basis set, was used. B3LYP hybrid functional incorporates approximations to the exchange–correlation energy functional in DFT (combination of exact exchange from Hartree–Fock theory, and from other sources). Polarization-incorporated 6-31G, 6-31G(d), is commonly used for organic compounds. Here, 6-31G was chosen to reduce the computational load and verify the effect of d-orbitals.

Descriptor Calculation and Selection

After structural optimization, 5666 and 552 molecular descriptors of organic peroxides were calculated by alvaDesc 2.0.88 and CODESSA 3,9 respectively, most of which were irrelevant to this study. These methods calculate molecular descriptors and fingerprints from structural information. Descriptors with small standard deviations and strong multicollinearity were eliminated. CODESSA was used to calculate the quantum chemical descriptors, such as HOMO/LUMO, and GA was used to find a descriptor set for model construction.

GA, applied as a variable selection method, is an iterative procedure that continuously improves the fitness function, from which the fitness score (indicating probability of descriptor set selection) is calculated. An initial population of descriptors (a few hundred sets) was selected at random or heuristically. Each iteration step calculated and assigned a fitness value to the descriptor sets, and proportional probabilities were used to select a new descriptor population. This selection procedure cannot independently generate a new point in the search space; thus, crossover and mutation were additionally used by GA to generate new descriptor sets. Here, the coefficient of determination (R2), after fivefold cross validation in PLS and SVR modeling, labeled GA-PLS and GA-SVR, respectively, was used as the fitness function. Number of components in the population was 100, crossover probability was 0.5, mutation probability was 0.2, and maximum number of generations was 200.

Regression and Validation

Here, PLS and SVR were used to develop the prediction models. PLS is a statistical linear regression method used to find fundamental relations between explanatory variables (X) and response variables (Y). It is widely used when the number of explanatory variables is significantly larger than the number of samples. Decompositions of X and Y, as shown below, were constructed to maximize the covariance between the latent variables (T) and response variables (Y).

graphic file with name ao1c06481_m001.jpg 1
graphic file with name ao1c06481_m002.jpg 2

where A is the number of latent variables, ta is the ath latent variable, pa is the ath loading, qa is the weight on ath latent variable, and E and f are the error terms that cannot be explained by X and Y. pa and qa were calculated to minimize the sum of squares of errors. The number of latent variables with the highest R2, obtained via fivefold cross validation, was used in the prediction model.

Variable importance in projection (VIP)10 scores of each descriptor were calculated to identify variables that contributed significantly to the prediction. VIP scores, as shown below, are defined for each X variable and j, as the sum of latent variables of its PLS-weight value (wij), weighted by the percentage of explained Y variance.

graphic file with name ao1c06481_m003.jpg 3

where h is the number of latent variables, wi is the weight vector, and R2(y, ta) is the percentage of explained Y variance.

SVR is also a regression method, performing linear and nonlinear regression using the kernel trick, implicitly mapping inputs into high-dimensional feature spaces. If x(i) is the explanatory variable for the ith sample, the response variable, f(x(i)), is expressed as follows

graphic file with name ao1c06481_m004.jpg 4
graphic file with name ao1c06481_m005.jpg 5

where b is a constant term, w is the weight vector, and k is the number of dimensions. The error function is expressed as follows

graphic file with name ao1c06481_m006.jpg 6

where C is the regularization parameter, n is the number of training samples, and ε specifies the epsilon tube within which no penalty is associated with the loss function.

The RBF kernel, a kernel function used in machine learning, shown below, was used.

graphic file with name ao1c06481_m007.jpg 7

where γ is the RBF kernel parameter. SVR includes three hyperparameters (C, ε, and γ) to be provided before model construction.

Model development and descriptor selection were performed using Python 3.7. Scikit-learn,11 a machine learning library in Python, was used for the PLS and SVR calculations. GridSearchCV,12 a parameter estimator in Python, was used for hyperparameter optimization. Fast optimization of hyperparameters was implemented following the procedure adopted by a previous publication.13

Hyperparameter optimization and descriptor selection were performed only on the training data set and then evaluated the performance of the model for the test data set. Model accuracy was evaluated on the common statistical parameters: root mean square error (RMSE), mean absolute error (MAE), and R2. These parameters were calculated as follows

graphic file with name ao1c06481_m008.jpg 8
graphic file with name ao1c06481_m009.jpg 9
graphic file with name ao1c06481_m010.jpg 10

Results and Discussion

Prediction models, including and excluding structural optimization and variable selection, were developed and analyzed. Additionally, prediction accuracies of existing models were compared (Table 2).

Table 2. Comparison of Predictive Performance of Existing Models for Test Data.

  Wang2 He3
geometry optimization DFT B3LYP/6-31G(d) MM+/MO PM1
frequency calculation    
variable selection   GA-MLR
number of descriptors 8 9
modeling method MLR SVR MLR SVR
number of training data 40 57
number of test data 10 14
RMSE 12.0 6.43 9.91 9.79

Case 1 (Geometry Optimization: MM)

In this case, only the MM calculation was performed before model development. After preprocessing, the model was developed by PLS, with five latent variables, and SVR (C = 8.0, ε = 0.00098, and γ = 0.00024). A parity plot of actual values versus calculated values is shown in Figure 1. Statistical parameters of the model for the training set were as follows: R2 = 0.90, RMSE = 12.35, and MAE = 9.45 for PLS, and R2 = 0.99, RMSE = 3.92, and MAE = 0.67 for SVR and those for the test set were as follows: R2 = 0.26, RMSE = 22.40, and MAE = 17.46 for PLS, and R2 = 0.10, RMSE = 24.69, and MAE = 19.95 for SVR. Prediction performance was lower than those of the existing models. Changing the number of latent variables in PLS, and the hyperparameter values in the SVR, did not improve the prediction accuracy.

Figure 1.

Figure 1

Actual vs calculated values of SADT in case 1 (left: PLS, right: SVR).

Case 2 (Geometry Optimization: MM/DFT)

In this case, the MM and DFT calculations were performed before model development. After preprocessing, the model was developed by PLS using 13 latent variables and SVR (C = 4.0, ε = 0.00098, and γ = 0.00049). A parity plot of actual values versus calculated values is shown in Figure 2. The statistical parameters of the model for the training set were as follows: R2 = 0.99, RMSE = 0.98, and MAE = 0.76 for PLS, and R2 = 0.99, RMSE = 2.15, and MAE = 0.34 for SVR, and those for the test set were as follows: R2 = 0.82, RMSE = 9.80, and MAE = 7.70 for PLS, and R2 = 0.77, RMSE = 11.07, and MAE = 8.68 for SVR. Prediction performance was significantly higher than case 1 and comparable to existing models. Prediction accuracy of both the models could be improved by DFT calculations, and the addition of quantum chemical descriptors, before model building. Changing the number of latent variables in PLS, and hyperparameter values in SVR, did not improve the prediction accuracy, similar to case 1.

Figure 2.

Figure 2

Actual vs calculated values of SADT in case 2 (left: PLS, right: SVR).

Case 3 (Geometry Optimization: MM/DFT, Variable Selection: GA)

In this case, in addition to preprocessing (similar to case 2), the descriptors were selected using GA-PLS and GA-SVR, before model building. After variable selection, the fitness function of GA was improved from R2 = 0.69 to R2 = 0.91 for PLS and from R2 = 0.38, to R2 = 0.90 for SVR. The model was developed by PLS with 11 latent variables and SVR (C = 466, ε = 0.02343, and γ = 0.00000714). A parity plot of actual values versus calculated values is shown in Figure 3. The statistical parameters of the model for the training set were as follows: R2 = 0.99, RMSE = 1.57, and MAE = 1.14 for PLS, and R2 = 0.99, RMSE = 3.69, and MAE = 1.78 for SVR, and those for the test set were as follows: R2 = 0.95, RMSE = 5.11, and MAE = 4.03 for PLS, and R2 = 0.91, RMSE = 6.87, and MAE = 5.15 for SVR. Prediction performance dramatically improved compared to case 2, and the prediction accuracy was also better than the existing models. Thus, it can be said that the prediction accuracy could be improved by the appropriate selection of variables before model building.

Figure 3.

Figure 3

Actual vs calculated values of SADT in case 3 (left: PLS, right: SVR).

Comparison of Prediction Accuracy between PLS and SVR

Comparison between PLS and SVR prediction accuracies are listed in Table 3. “Descriptors” line shows the change in the number of descriptors due to preprocessing, and “variable selection” line shows the number of descriptors after applying GA. The prediction accuracy was good overall, with a higher value for PLS than SVR, for the data set used in this study. Geometry optimization using MM/DFT calculations, addition of quantum chemical descriptors, and variable selection using GA influenced prediction accuracy.

Table 3. Model Development Condition and Validation Results.

  case 1 case 2 case 3
descriptors 5889 to >2659 1216 + 553 to >1586 1216 + 553 to >1586
calculated by alvaDesc 2 alvaDesc 2 + CODESSA 3 alvaDesc 2 + CODESSA 3
geometry optimization MM (UFF) MM (UFF), DFT (6-31G/B3LYP) MM (UFF), DFT (6-31G/B3LYP)
variable selection     GA-PLS GA-SVR
      1586 to >559 1586 to >524
modeling method PLS SVR PLS SVR PLS SVR
RMSE 22.4 24.7 9.8 11.1 5.1 6.9
MAE 17.5 20.0 7.7 8.7 4.0 5.2
R2 0.26 0.23 0.82 0.77 0.95 0.91

Comparison of Prediction Accuracy with Existing Models

The model in the literature2 had small RMSE and high prediction accuracy; however, the computational load was very high due to DFT calculation. On the other hand, the model in the literature,3 which used the semiempirical molecular orbital method (AM1), had relatively larger RMSE, although the computational load was low.

The prediction accuracy of the proposed model was significantly improved by the addition of quantum chemical descriptors and variable selection using GA. Changing the basis set for DFT calculation to 6-31G could reduce the computational load while maintaining the prediction accuracy.

The addition of quantum chemical descriptors and appropriate optimization of molecular conformation before calculating the descriptors were important for a model with high accuracy. However, improvement in prediction accuracy reached a ceiling at some point, so improvement in prediction accuracy should be balanced with a computational load to create an effective model (Table 4).

Table 4. Comparison with Predictive Performance of Existing Models for Test Data.

  proposed method Wang2 He3
geometry optimization MM/DFT 6-31G/B3LYP DFT 6-31G(d)/B3LYP MM+/MO PM1
frequency calculation      
variable selection GA-PLS GA-SVR   GA
number of descriptors 559 521 8 9
modeling method PLS SVR MLR SVR MLR SVR
number of training data 52 40 57
number of test data 13 10 14
RMSE 5.11 6.87 12.0 6.43 9.91 9.79

Descriptors with High Impact on Prediction Accuracy

Top 15 descriptors with highest VIP scores are shown in Table 5. Various descriptors related to oxygen bonding (bond order, valence, and charge), and quantum chemical descriptors (LUMO and repulsion/attraction energy), are included in the table. The result shows that these descriptors obviously influenced the prediction accuracy of SADT.

Table 5. Top 15 Descriptors with Highest VIP Scores.

no. VIP descriptor calculated by explanation
1 2.248 AvgBondOrd_O CODESSA average bond order for all atoms of O type
2 2.231 MaxOneCent-ElecElecRepEn CODESSA maximum one-center electron–electron repulsion energy
3 2.182 SM02_EA(dm) alvaDesc spectral moment of order 2 from edge adjacency mat. weighted by dipole moment
4 2.176 SM08_EA(dm) alvaDesc spectral moment of order 8 from edge adjacency mat. weighted by dipole moment
5 2.133 MinOneCent-CoreElecAttrEn CODESSA minimum one-center core-electron attraction energy
6 2.082 SM07_EA(dm) alvaDesc spectral moment of order 7 from edge adjacency mat. weighted by dipole moment
7 2.040 MaxTwoCent-TotEn_AB CODESSA maximum two-center total energy, all bonds
8 2.021 SpMax_B(s) alvaDesc leading eigenvalue from Burden matrix weighted by I-state
9 2.004 AvgVal_O CODESSA average valence for atoms of O type.
10 1.978 MaxTwoCent-CoreElecResEn_AP CODESSA maximum two-center core-electron resonance energy, all pairs
11 1.971 B02[O–O] alvaDesc presence/absence of O–O at topological distance 2
12 1.958 AvgBondOrd_O_O CODESSA average bond order among all bonds between atoms of type O and O
13 1.949 qpmax alvaDesc maximum positive charge
14 1.907 MinTwoCent-CoreElecAttrEn_AP CODESSA minimum two-center core-electron attraction energy, all pairs
15 1.861 LUMOEn CODESSA energy of lowest energy unoccupied molecular orbital

Analyzing Effects of Preprocessing Including and Excluding DFT/GA

In no. 53, 54, 55, 59, 64, and 65, the prediction accuracy was improved by performing the DFT calculations before descriptor calculations, but in no. 57, 58, and 60, preprocessing by DFT calculations did not significantly improve the prediction accuracy. Additionally, no. 56, 62, and 63 showed high prediction accuracy from the beginning. In all the cases, except no. 62, variable selection via GA improved the prediction accuracy, but its overall influence on prediction-accuracy improvement was smaller than the effect of DFT optimization (Figures 4).

Figure 4.

Figure 4

Comparison of prediction accuracy between PLS models.

Figure 6.

Figure 6

Molecular structure of no. 54, bis(3,5,5-trimethylhexanoyl)peroxide.

Figure 7.

Figure 7

Molecular structure of no. 56, tert-amyl peroxipivalate.

No. 53, 54, 56, and 58 were compared as representatives (Figures 58). Conformation (bond length and angle) near the O–O bond of each molecule significantly changed after DFT calculation, as presented in Table 6. No. 53 and 54 exhibited relatively larger bond length changes than no. 56 and 58 due to the greater difference between the initial and optimal states for no. 53 and 54 than between no. 56 and 58. No. 56 exhibited a low prediction error for case 1, despite only 18 iterations, which could be due to the optimal initial conformation, where only MM calculations were performed. A large number of iterations yielded a high prediction accuracy, with some exceptions. Prediction error for no. 58 could not be reduced significantly. This could be due to the DFT calculation errors because 6-31G, instead of 6-31G(d), was used as the basis function.

Figure 5.

Figure 5

Molecular structure of no. 53, dioctanoyl peroxide.

Figure 8.

Figure 8

Molecular structure of no. 58, dimyristyl peroxydicarbonate.

Table 6. Comparison Results for Four Representative Molecules.

element unit no. 53 no. 54 no. 56 no. 58
improvement of prediction accuracy °C 19.9 52.9 1.0 3.1
  % 79.6 86.2 17.3 16.6
iterations times 21 51 18 25
length (O1–O2) angstrom +0.263 +0.237 +0.162 +0.174
length (C1–O1) angstrom +0.059 +0.064 +0.020 +0.024
length (C2–O2) angstrom +0.059 +0.063 +0.065 +0.046
angle(∠C1O1O2) degree –9.9 –12.5 –7.6 –13.5
angle(∠O1O2C2) degree –9.9 –9.9 –2.9 –12.5

Thus, appropriate optimization of molecular conformation and addition of quantum chemical descriptors immensely influence SADT prediction (Tables 710).

Table 7. Representative Bond Lengths and Angles before/after Optimization for No. 53.

element unit before optimization after optimization difference
O1–O2 angstrom 1.268 1.531 +0.263
C1–O1 angstrom 1.359 1.418 +0.059
C2–O2 angstrom 1.359 1.418 +0.059
∠C1O1O2 degree 124.5 114.6 –9.9
∠O1O2C2 degree 124.5 114.6 –9.9

Table 10. Representative Bond Lengths and Angles before/after Optimization for No. 58.

element unit before optimization after optimization difference
O1–O2 angstrom 1.272 1.446 +0.174
C1–O1 angstrom 1.354 1.378 +0.024
C2–O2 angstrom 1.352 1.398 +0.046
∠C1O1O2 degree 121.7 108.2 –13.5
∠O1O2C2 degree 121.1 108.6 –12.5

Table 8. Representative Bond Lengths and Angles before/after Optimization for No. 54.

element unit before optimization after optimization difference
O1–O2 angstrom 1.273 1.510 +0.237
C1–O1 angstrom 1.352 1.416 +0.064
C2–O2 angstrom 1.353 1.416 +0.063
∠C1O1O2 degree 122.3 109.8 –12.5
∠O1O2C2 degree 120.5 110.4 –9.9

Table 9. Representative Bond Lengths and Angles before/after Optimization for No. 56.

element unit before optimization after optimization difference
O1–O2 angstrom 1.297 1.459 +0.162
C1–O1 angstrom 1.360 1.380 +0.020
C2–O2 angstrom 1.417 1.482 +0.065
∠C1O1O2 degree 125.0 117.4 –7.6
∠O1O2C2 degree 108.2 105.3 –2.9

Double Cross-Validation

Here, following a previous study, the data set was divided into training and test data to conduct holdout validation of the prediction accuracy. The temperature range of the test data was 15–70 °C, and it was unclear whether the model was applicable to a wider temperature range. To verify the extrapolation (generalization performance) for a high temperature range, using a small data set, double cross-validation was conducted using case 3 data, variables, and model building conditions. In inner cross-validation, hyperparameters were determined via fivefold cross-validation, whereas in outer cross-validation, leave-one-out cross-validation was performed. Statistical parameters of the PLS model were as follows: R2 = 0.76, RMSE = 17.74, and MAE = 13.29, and those of the SVR model were as follows: R2 = 0.74, RMSE = 18.11, and MAE = 13.50. No significant difference in the prediction error at low and high temperatures was observed, and both the models evenly predicted a wide range of SADTs, with good accuracy (Figure 9).

Figure 9.

Figure 9

Actual vs calculated values of SADT (left: PLS, right: SVR).

Conclusions

In this study, we constructed a model to estimate the SADT of organic peroxides. PLS regression and SVR were applied on the descriptors calculated using the structural information of the compounds, to predict the SADTs. MM and DFT calculations were performed before calculating the descriptors, and GA was used for variable selection. In DFT calculation, B3LYP with the 6-31G basis set was used instead of 6-31G(d), significantly improving prediction accuracy and reducing computational load. Thus, a model with higher accuracy than the existing SADT prediction models was developed.

Appropriate preprocessing and variable selection were important for a model with high accuracy, and optimizing compound conformation before descriptor calculations improved SADT prediction accuracy. However, the improvement in prediction accuracy should be balanced with a computational load to create an effective model.

In the future, application of machine learning models other than SVR, preprocessing methods for descriptor calculations, and descriptor selection with high contribution to SADT prediction, to further improve prediction accuracy, will be investigated.

Acknowledgments

This work was supported by a Grant-in-Aid for Scientific Research (KAKENHI) (grant number 19K15352) from the Japan Society for the Promotion of Science.

Glossary

Abbreviations

SADT

self-accelerating decomposition temperature

QSPR

quantitative structure–property relationship

DFT

density functional theory

MLR

multiple linear regression

SVR

support vector regression

GA

genetic algorithm

MM

molecular mechanics

PLS

partial least squares

HOMO

highest occupied molecular orbital

LUMO

lowest unoccupied molecular orbital

VIP

variable importance in projection

RMSE

root mean square error

MAE

mean absolute error

The authors declare no competing financial interest.

Notes

Data and Software Availability: structures of chemical compounds were downloaded from the Chemical Book database.14 Some of the software used to calculate optimal conformation of molecules can be freely downloaded from the linked sites.57 Python 3.7 was used to build the models and the source code was referenced to the linked site.15 All data underlying the results are available as part of the article and no additional source data are required.

References

  1. Chen W.-T.; Chen W.-C.; You M.-L.; Tsai Y.-T.; Shu C.-M. Evaluation of thermal decomposition phenomenon for 1,1-bis(tertbutylperoxy)-3,3,5-trimethylcyclohexane by DSC and VSP2. J. Therm. Anal. Calorim. 2015, 122, 1125–1133. 10.1007/s10973-015-4985-2. [DOI] [Google Scholar]
  2. Wang B.; Yi H.; Xu K.; Wang Q. Prediction of the self-accelerating decomposition temperature of organic peroxides using QSPR models. J. Therm. Anal. Calorim. 2017, 128, 399–406. 10.1007/s10973-016-5922-8. [DOI] [Google Scholar]
  3. He P.; Pan Y.; Jiang J.-c. Prediction of the self-accelerating decomposition temperature of organic peroxide based on support vector machine. Procedia Eng. 2018, 211, 215–225. 10.1016/j.proeng.2017.12.007. [DOI] [Google Scholar]
  4. Sun J.; Li Y.; Hasegawa K. A study of self-accelerating decomposition temperature (SADT) using reaction calorimetry. J. Loss Prev. Process Ind. 2001, 14, 331–336. 10.1016/S0950-4230(01)00024-9. [DOI] [Google Scholar]
  5. Avogadro Home Page. https://avogadro.cc/ (accessed 2021-10-11).
  6. Firefly Computational Chemistry Program Home Page. http://classic.chem.msu.su/gran/gamess/ (accessed 2021-10-11).
  7. SourceForge MoCalc2012 Download Page. https://sourceforge.net/projects/mocalc2012/ (accessed 2021-10-11).
  8. AlvaDesc Home Page. https://www.alvascience.com/alvadesc/ (accessed 2021-10-11).
  9. Semichem Page for Codessa III. http://www.semichem.com/codessa/codessa-new.php (accessed 2021-10-11).
  10. Akarachantachote N.; Chadcham S.; Saithanu K. Cutoff Threshold of Variable Importance in Projection for Variable Selection. Int. J. Pure Appl. Math. 2014, 94, 307–322. 10.12732/ijpam.v94i3.2. [DOI] [Google Scholar]
  11. Scikit-learn Home Page. https://scikit-learn.org/stable/ (accessed 2021-10-11).
  12. Scikit-learn GridSearchCV Page. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html (accessed 2021-10-11).
  13. Kaneko H.; Funatsu K. Fast optimization of hyperparameters for support vector regression models with highly predictive ability. Chemom. Intell. Lab. Syst. 2015, 142, 64–69. 10.1016/j.chemolab.2015.01.001. [DOI] [Google Scholar]
  14. Chemical Book Home Page. https://www.chemicalbook.com/ProductIndex_JP.aspx (accessed 2021-10-11).
  15. GitHub Home Page. https://github.com/hkaneko1985/dcekit (accessed 2021-10-111).

Articles from ACS Omega are provided here courtesy of American Chemical Society

RESOURCES