Machine Learning Densities, Detonation Velocities, and Formation Enthalpies of Energetic Materials Using Quantum Chemistry Descriptors

Patrick Kimber; James Mattock; Sophia Wheeler; John Mullaney; Alison Beardah; Justin Fellows; Kenny Jolley; Felix Plasser

doi:10.1021/acs.jctc.5c00865

. 2025 Aug 28;21(17):8406–8419. doi: 10.1021/acs.jctc.5c00865

Machine Learning Densities, Detonation Velocities, and Formation Enthalpies of Energetic Materials Using Quantum Chemistry Descriptors

Patrick Kimber ^†, James Mattock ^†, Sophia Wheeler ^‡, John Mullaney ^‡, Alison Beardah ^‡, Justin Fellows ^‡, Kenny Jolley ^†, Felix Plasser ^†,^*

PMCID: PMC12424169 PMID: 40874326

Abstract

The prediction of detonation parameters is a challenging task requiring to bridge the gap between microscopic molecular features and macroscopic materials properties. Whereas traditional routes are concerned with empirical equations, we present a machine learning approach to this task here. Our approach capitalizes on molecular descriptors from high-level quantum chemistry as input to produce models fitted against experimental reference data to model three key quantities: the crystalline density, the detonation velocity, and the heat of formation. To determine the detonation products, we use a new nonempirical product optimization scheme, maximizing the heat release, which is extensible to any molecular composition. We find, for all three properties, that the machine-learned results significantly surpass standard rule-based schemes. Finally, we present an all in silico scheme for predicting detonation velocities, highlighting that this is almost as good as when experimental densities are used as input. In summary, we believe that this work is a major step toward the goal of accurately predicting detonation parameters by showing how to leverage the power of quantum chemistry for this task.

graphic file with name ct5c00865_0014.jpg

graphic file with name ct5c00865_0012.jpg

1. Introduction

Energetic materials, principally explosives, propellants, and pyrotechnics, have a wide range of uses, including both military and civilian applications. This has inevitably led to the design of such materials becoming a rapidly advancing field. Given the inherent complexities in the synthesis and analysis of energetic materials, the prediction of novel compounds is an ideal application for quantum mechanical methods. The varied applications of energetic materials have resulted in a significant body of work as different use-cases require radically different properties. , Therefore, a method of accurately quantifying the properties of energetic materials is crucial. There currently exist a large array of methods and computational approaches, ranging from high-level single molecule approaches to machine learning. −

Computational approaches to modeling molecular structures and crystals have become an essential part of the materials discovery pipeline since they can be utilized as both a predictive and diagnostic tool. In particular, in the context of energetic materials, it is desirable to computationally screen new candidate molecules avoiding the need for specialized equipment and the risk that is invariably associated with experimental research on explosives. The detonation behavior of energetic materials is crucially influenced by their density and solid state enthalpy of formation, and these two properties have received specific emphasis in the literature. − Well-established approaches, based around these two properties, for the prediction of detonation properties of energetics involve the use of thermochemical codes, the use of empirical models, and structure–activity relationships. ,, Thermochemical codes generally require the density and enthalpy of formation as input parameters to determine detonation properties. For densities, empirical models have been developed starting from a single-molecule perspective and have been extended to include a consideration of the molecular environment. Empirical schemes for enthalpies of formation include a consideration of bond enthalpies or proceed via group additivity approaches. , Finally, we note that a specific challenge in using empirical equations to compute detonation velocities is that these usually rely on an a priori determination of the detonation product distribution, ,, a problem, which we will revisit below.

Identifying energetic molecules with high densities is often the main design strategy for energetic material discovery, since empirical schemes suggest that the density has the largest influence on detonation velocity and pressure. , Densities can be trivially approximated by dividing the molecular mass by the molecular volume, however, such an approach neglects all influence of the crystalline environment. A partial improvement is given by group additivity approaches. However, these neglect information regarding molecular configuration and molecular connectivity meaning that not all isomeric structures would obtain distinct densities. Additionally, an explicit consideration of intermolecular forces is absent from group additivity approaches, and additivity schemes rely heavily on having a large amount of experimental reference data upon which to parametrize the atom and group volumes. As the availability and power of computational resources has increased, methods for simulating molecular crystals explicitly have become more readily accessible. − These operate via the use of periodic density functional theory while cheaper methods based on force fields have also been developed. While these simulations can directly address the problems faced by additivity methods, they are not without their own challenges and even sophisticated crystal structure prediction routines can struggle to correctly identify the crystal structures of novel organic molecules. On the other hand, quantum chemistry computations on individual molecules are clearly cheaper; however in this case the challenge is to bridge between microscopic molecular properties and the macroscopic properties of interest.

In recent years, artificial intelligence (AI) and machine learning (ML) have emerged as attractive techniques to computational chemists, with applications in the development of new computational methods, − molecular property prediction and discovery, , and even the mapping of entire chemical landscapes for small organic molecules. More specifically, ML has been used for energetic materials properties. Densities have been modeled previously using features generated by cheminformatics toolkits such as RDkit. Going beyond scalar quantities (e.g., the oxygen balance), other featurization techniques have been developed to create input features by mapping structural information to vector quantities based on molecular connectivity or fingerprints. Other approaches for density ML models have considered molecular topology as a starting point for feature generation, using molecular graphs as inputs. The enthalpy of formation has been identified as an intuitive input parameter for detonation velocity models, however computing accurate values for this is an additional challenge. Recent efforts in this regard have utilized deep learning models for gas phase formation enthalpies, building upon existing ML methods for obtaining geometries, vibrational frequencies and energies from the generative GDB-11 database. −

A special challenge in the prediction of energetic materials properties is the scarcity of data, with detonation parameters available only for a few hundred molecules. This poses severe challenges in implementing a brute force big-data approach. Conversely, we opt here for a chemically informed procedure feeding in molecular properties, which are likely to be relevant to describe solid-state interactions. The numerical values for these properties, in turn, are determined by modern quantum chemistry methods. Our overall approach is outlined in Figure . We will focus on three materials properties, the density (ρ), the heat of formation (ΔH _f), and the detonation velocity (D) and investigate how well they can be predicted based on quantum-chemistry data. First, we will investigate basic models for all three quantities and these will be described in Section along with more general background information. These basic models all possess a fairly simple functional form based on only a small number of molecular features. To enable our ML approach, we will first augment the set of features by a variety of data points that can be obtained from quantum chemistry computations. These will then be fed into our ML models to obtain solid state corrections for our results and thus improved accuracy. The ML approach will be described in Section . Subsequently, a detailed discussion of the results obtained with various models will be presented in Section . During this work, we will also put specific emphasis on how to determine the products formed during explosion and what accuracy in terms of densities is required for a full in silico determination of the detonation velocity.

Flowchart of the procedure employed in this work to model the three parameters of interest: density (ρ), enthalpy of formation (ΔH _f), and detonation velocity (D).

2. Background

2.1. Densities

Using isolated molecules as a starting point, the trivial approximation for the density relies on dividing the molecular mass (M) by the molecular volume (V).

ρ_{triv} = \frac{M}{V}

This type of approach has been evaluated for a set of 180 energetic molecules by Rice et al., where the volume was computed at 0.001 electrons/bohr³ of the electron density. It was found that the root-mean-square prediction error was 3.7% with a maximum error of 0.178 g cm^–3. Overall, this approach was found to perform poorly for molecules where extensive intermolecular forces, in particular hydrogen bonding, were expected in the crystalline phase.

If the crystal structure is known, then the true crystalline density can be obtained as

ρ = \frac{Z \times M}{V_{uc}}

where Z is the number of molecules per unit cell. ρ differs from ρ_triv because the unit cell volume V _uc differs from the molecular volume V. V _uc, in turn, is crucially affected by intermolecular interactions and potentially by voids in the crystal. In light of this, the single-molecule model was later extended by Politzer et al. to include corrections based on an analysis of the molecular electrostatic potential (ESP) as a way to estimate a molecule’s propensity to form intermolecular interactions in the solid state.

2.2. Detonation Velocities

The detonation velocity, D, can be approximated via the Kamlet–Jacobs equation.

D = A ϕ^{1 / 2} (1 + B ρ_{\exp})

ϕ = n_{gas} \sqrt{{m̅}_{gas} q_{cal}}

A = 1.01,, B = 1.30

where A and B are empirically derived coefficients, n _gas is the number of moles of gas produced per unit gram of explosive (g^–1), m̅_gas is the average molecular mass of the gases produced (g mol^–1), and q _cal is the heat of explosion (cal g^–1). ρ_exp is the experimental loading density of the explosive (g cm^–3); it enters eq in highest order highlighting its importance in predicting detonation parameters. For maximal D, the loading density ρ_exp should ideally be equal to or very close to the material’s crystalline (theoretical maximum) density unless there are significant safety or stability issues with handling the material at this value.

While the density is a well-definited physical property of the material, the remaining terms in eq , from a theoretical standpoint, are traditionally determined according to rule-based schemes. The distribution of detonation products, which is a prerequisite for computing n _gas, m̅_gas, and q _cal, is frequently determined for CHNO molecules based on the Kamlet–Jacobs (KJ) rules and proceeds as follows:

(1)
Convert all H atoms to H₂O(g)
(2)
Convert as many C atoms as possible to CO₂(g)
(3)
Convert remaining C atoms to C(s)
(4)
Convert all N to N₂(g)

The Kamlet–Jacobs rules are obviously limited to CHNO species. In addition, we highlight that these rules fail to account for some molecular compositions, for example, the absence of any oxygen atoms or an excess of oxygen remaining after step 2. Other rule based schemes have been conceptualized to account for more exotic molecular compositions. For example, the (modified) Kistiakowsky–Wilson (KW) rules can be employed depending on the oxygen balance of the structure; products for highly oxygen deficient structures are determined differently to those with an oxygen balance of more than −40%. , An extension to the KW rules was later introduced by Springall and Roberts (SR) to redistribute the original products. , More recently, Sivapirakasam et al. introduced a rule-based scheme to account for a wider range of molecular compositions including for compounds which contain metal atoms, are oxygen deficient, or contain halogens. Politzer and Murray evaluated the KJ, (modified) KW and SR rules for a set of 14 compounds with positive oxygen balances and found that the KJ rules provide product sets which yield the lowest errors in detonation velocity predictions using eq . More recently Muravyev et al. highlighted that the Kamlet–Jacobs rules for CHNO molecules generally result in product sets which optimize the enthalpy of explosion and, thus, the detonation velocity, obeying the so-called “maximal heat release principle”. In this work, we use the original Kamlet–Jacobs rules where possible but append two additional steps to account for the presence of excess oxygen, or halogen atoms:

(5)
Convert remaining O to O₂(g)
(6)
Convert all X atoms to X₂(g)

To summarize, a variety of schemes for determining detonation products have been developed. It is not always clear, which one to apply, especially if more exotic molecular compositions and elements other than CHNO are involved. We will revisit this problem below.

2.3. Enthalpy of Formation

In addition to the density, the solid state enthalpy of formation is sometimes used as an input parameter for modeling detonation properties of energetic materials, for example by thermochemical codes. A basic model for computing enthalpies of formation amounts to considering the general formation reaction

a C (s) + \frac{b}{2} H_{2} (g) + \frac{c}{2} N_{2} (g) + \frac{d}{2} O_{2} (g) + \frac{e}{2} X_{2} (g) \to C_{a} H_{b} N_{c} O_{d} X_{e} (g / s)

where, on the left side, all constituents except for solid carbon (graphite) are in the gas phase. The product on the right side can be chosen to be either gaseous or solid defining the gas phase [ΔH _f(g)] and solid state [ΔH _f(s)] enthalpies of formation. These are, in turn, related via the sublimation enthalpy:

Δ H_{f} (s) = Δ H_{f} (g) - Δ H_{sub}

In this work, we evaluate ΔH _f(g) according to eq by considering the total enthalpies obtained via DFT computations. To obtain ΔH _f(s), we will later endeavor to include the sublimation enthalpy implicitly via our ML models.

Practically speaking all components of eq except for solid carbon are easily amenable to quantum chemistry approaches. To obtain a reference value for carbon, we use 1/60 of the energy of C₆₀ fullerene instead. This is certainly a crude approximation but we will show below that it is well-compensated via the ML schemes. Thus, in summary, ΔH _sub and the correction for C₆₀ are both left to be corrected within the ML models.

3. Methods

Having discussed the overall background and state of the art, we now proceed to a description of our specific approach. We will first discuss the curation of our data set of experimental reference values before outlining specific details of the quantum chemistry computations performed to obtain our database of descriptors. Subsequently, we present our approach toward determining optimal product distributions. We will conclude with a discussion of the postprocessing steps and more technical details on machine learning and model evaluation.

3.1. Data Collection

Experimental reference data is somewhat scarce for energetic materials, particularly with regards to their detonation properties. In 2022, Muravyev et al. published an open data set of energetic materials containing detonation parameters for 260 CHNOFCl structures. The data set contains experimental (loading/charge) densities, theoretical maximum (crystalline) densities, solid state enthalpies of formation, detonation velocities and pressures, and heats of detonation. In this work, we consider only structures from the Muravyev data set if they are single molecule structures (no ionic structures or cocrystals) and solid under experimental conditions. Upon refining the Muravyev data set we retain 132 structures for which reference data is available. While reference values for the densities and detonation velocities are available for all of these structures, some values for the enthalpies of formation are absent. A selection of the studied molecules is shown in Figure . We found that some of the detonation velocity values from the Muravyev data set differ from other values published in the literature. − These discrepancies and the final values used for our data set are discussed in more detail in Section S2.

Selection of the 132 energetic molecules studied within this work.

Noting that the density is an elementary property of molecular solids, not specific to energetic molecules, we have constructed an additional data set using the Cambridge Structural Database (CSD). We have queried the CSD according to the following criteria in order to create a larger data set used for density modeling: First, the molecules must contain C, H, N, and O atoms, and may contain heavier atoms up to Cl but excluding Li, Be, Ne, Na, Mg and Al. Next, the molecule’s crystalline density was queried between 1.5 and 2.5 g cm^–3, with a molecular mass no larger than 500 g mol^–1. The entries must contain only single molecule structures; this excludes ions, cocrystals and solvates. We also exclude powder structures, structures with significant disorder, and structures which are labeled as having significant error. The R-factor for the entry must be less than 0.05. Finally, we only consider structures which have been added to the CSD since November 2020 in order to target newer structures or entries on the assumption these may have better accuracy than legacy entries. These criteria ensure that the millions of entries in the CSD are refined to a manageable number in terms of computational effort and data curation. After removing duplicate entries from the CSD along with all energetics from the first set, our query returned 801 unique molecules which will be used for our second density data set and these are described in the Supporting Information.

3.2. Quantum Chemistry Computations

In this section, we provide the details of the quantum chemistry calculations performed in order to obtain molecular descriptors. Geometry optimizations, vibrational analyses, and additional single-point computations were performed for all molecules in ORCA, using the r²SCAN-3c method. − All structures were optimized and confirmed as being minimum energy structures via the vibrational analysis. Using the vibrational analyses, we compute the reactant total enthalpy (H) as well as the vibrational enthalpy (H _vib) and entropy (S _vib), as determined via the modified rigid-rotor harmonic oscillator (RRHO) model. Polarizabilities (α), dipole moments (μ) and quadrupole moments (Q) are computed for all molecules at their optimized geometries at the r²SCAN-3c level. Molecular volumes (V) and surface areas (SA) are determined by the GEPOL algorithm, where atoms are first represented with spherical surfaces corresponding to their van der Waals radii. The algorithm then uses a tessellation approach to select the parts of these spherical surfaces which form the molecular surface. In practice, these descriptors are obtained by running a single point conductor-like polarizable continuum model (CPCM) solvation computation and specifying dielectric constant = 1 and refractive index = 1. These descriptors are summarized in Table .

1. Descriptors Used in This Work from ORCA DFT (r²SCAN-3c) Calculations .

Name	Descriptor
H (H _vol)	Total enthalpy (per volume)
E _disp	Dispersion energy
H _vib	Vibrational enthalpy
S _vib	Vibrational entropy
ΔH _f(g)	Gas-phase enthalpy of formation, eq
q _cal	Enthalpy of explosion (cal g^–1)
SA (SA_vol)	Molecular surface area (per volume)
V	Molecular volume
μ (μ_vol)	Molecular dipole moment (per volume)
Q (Q _vol)	Molecular quadrupole moment (per volume)
α (α_vol)	Trace of molecular polarizability tensor (per volume)

Open in a new tab

The subscript “vol”denotes that the volume normalised descriptor is also considered.

Aside from the above-presented descriptors, we aim to obtain a more detailed description of how the target molecule interacts with a condensed phase environment. For this purpose, we perform computations using the SMD solvation model. The motivation behind this approach is not to model a specific solvent but to learn how the molecule generally interacts with a condensed phase environment and to determine descriptors related to the individual interaction terms. Our assumption is if a molecule is strongly stabilized by a solvent due to electrostatic interactions or hydrogen bonds, then the same interactions may also operate in the solid state providing additional cohesive energy. SMD computations are performed in connection with the ωB97M-V/def2-TZVP level of theory , as implemented in Q-Chem 6.0.

The electrostatics modeled by the SMD solvation approach (see Table ) are based on the electron density. The overall solvation energy can be partitioned into electrostatic (E _GENP) and cavitation-dispersion terms (CDS). The GENP term depends on the dielectric constant ϵ of the solvent and we use a value of ϵ = 2 here to model a generic reaction field. In contrast, the CDS term is obtained as a response to other solvent properties. Of immediate interest in this work are the Abraham’s hydrogen bond acidity (SolA) and basicity (SolB) of the solvent. Within Q-Chem 6.0, the van der Waals radii of oxygen atoms are scaled according to SolA below a value of 0.5. In the interest of consistency, we perform a reference computation where SolA is set to 0.5, and the GENP term is taken from this computation. Then, SolA and SolB are independently increased to 0.93 and 0.5 respectively. The difference between the reference computation and the computations where SolA and SolB are “switched on” yields two cavitation-dispersion terms representing the response to hydrogen bond acidity (E _CDSA) and hydrogen bond basicity (E _CDSB).

2. Descriptors Used in This Work from Q-Chem SMD (ωB97M-V/def2-TZVP) Calculations .

Name	Descriptor
E _CDSA	SolvationH-bond acidity term
E _CDSB	SolvationH-bond basicity term
E _GENP	Solvationelectrostatic energy

Open in a new tab

These descriptors are normalized by volume per default.

3.3. Optimal Product Distributions

Before we can proceed we need to address one crucial question. Some of the central parameters discussed above, such as q _cal, n _gas, and m̅_gas depend on the products formed. We discussed the standard rules in operation in Section . However, such empirical rule based schemes have two clear problems: (i) if competing schemes exist for a given molecular composition, then it is not clear which one to choose and (ii) such schemes are not available for all elements potentially of interest. To avoid this problem, here we introduce a nonempirical scheme for determining detonation products which is applicable to all molecular compositions without the need to adjust or modify steps. Our approach is to perform an optimization of products in order to maximize the enthalpy of explosion, thus obeying the “maximal heat release principle” per construction. We provide a mathematical formulation for this approach here.

The optimization approach begins by considering the general reaction equation for the detonation process

C_{b_{1}} H_{b_{2}} N_{b_{3}} O_{b_{4}} X_{b_{5}} ... \to x_{1} H_{2} O + x_{2} {CO}_{2} + x_{3} N_{2} + x_{4} X_{2} + ...

where the left side represents the chemical structure of the target molecule and the right side represents all potential products formed. The composition of the reactant molecule can thus be summarized as a vector b and the product distribution as a vector x. Furthermore, we define a matrix A whose rows and columns represent the individual chemical elements and products, respectively. For example H₂O is represented via A ₂₁ = 2/A ₄₁ = 1 and CO₂ via A ₁₂ = 1/A ₄₂ = 2. The stoichiometry of eq can now be represented via the matrix vector product

Ax = b

where any vector x, that is a solution to eq , will provide a properly balanced reaction when inserted into eq . In addition, we require that all elements of x are positive.

The enthalpy of explosion is given as

q_{cal} = \frac{H_{prod.} - H (C_{b_{1}} H_{b_{2}} N_{b_{3}} O_{b_{4}} X_{b_{5}} ...)}{M}

H_{prod.} = x_{1} H (H_{2} O) + x_{2} H ({CO}_{2}) + x_{3} H (N_{2}) + x_{4} H (X_{2}) + ...

H_{prod.} = c^{T} x

where, in the last step we have defined a vector c whose entries correspond to the enthalpies of the different possible products. These enthalpies derive from small-molecule gas-phase quantum chemistry computations. For carbon we again use C₆₀ as a reference. While this is certainly an approximation, we also note that the possible alternative of using perfectly monocrystalline and defect-free graphite does not seem more justified in representing the products of a violent detonation process.

It can now be seen that minimizing q _cal is equivalent to minimizing c ^T x. We can now rephrase the optimization problem as the task of finding the vector x that minimizes c ^T x subject to the constraint that Ax = b and that all elements of x are non-negative, that is

\begin{array}{l} c^{T} x \overset{x}{\to} \min \\ subject to & {\begin{array}{l} Ax = b \\ \forall i : x_{i} \geq 0 \end{array} \end{array}

This is a standard linear programming task for which a number of efficient implementations have been devised; here we use SciPy. We will evaluate the performance of this optimization scheme against the KJ rules in conjunction with eq , and subsequently feed the optimized product data into our ML models.

The described approach will work whenever eq is well-defined and possesses a solution. In practice, this means that suitable potential detonation products for all relevant chemical elements have to be included in the procedure. Aside from this, the approach is completely general.

3.4. Postprocessing

To carry out the remaining steps we developed an in-house Python toolkit, EMProp, for data parsing, postprocessing, and machine learning. We explain in more detail the procedures used to extract and manipulate the data. The descriptors determined at this step are summarized in Table . The procedure begins by reading in the 3D geometry of the molecule using OpenBabel. The molecular mass (M) is extracted, alongside other descriptors which OpenBabel provides: the octanol–water partition coefficient (log P), the topological polar surface area (TPSA), the molar refractivity (A), and the number of rotors (N _rot). The trivial density estimate (ρ_triv) is obtained by dividing the molecular mass by the molecular volume (eq ).

3. Molecular Descriptors Generated in the Post-Processing Stage.

Name	Descriptor
N _X	No. of atoms of X (X = C, H, N, O, Cl, F, S)
N _NO	Number of nitrogen + oxygen atoms
OB	Oxygen balance (eq 13)
A _HB/D _HB	Number of hydrogen bond acceptors/donors
N _rot (N _rot, vol)	Number of rigid rotors (per volume)
M	Molecular mass
ρ_triv	Trivial density (eq )
A	Molar refractivity
log P	Octanol–water partition coefficient
TPSA (TPSA_vol)	Topological polar surface area (per volume)
m̅ _gas	Average M of gases produced by detonation
n _gas	Moles of gas produced by detonation

Open in a new tab

Next, we extract the number of hydrogen bond acceptors (A _HB) and donors (D _HB) by querying the structure with SMARTS strings. We denote hydrogen bond acceptors as N or O containing functional groups (excluding the N atoms of pyrrole, nitro, or amide groups). We consider any nitrogen or oxygen atom with at least 1 H atom attached as a hydrogen bond donor. SMARTS matching is also used to obtain the number of C, H, N, O, F, Cl, and S atoms present in the structure (N _X, where X = C, H, N, O, F, Cl, or S) We use the atom counts to compute the oxygen balance (OB):

O B = \frac{- 1600}{M} \times (2 N_{C} + \frac{N_{H}}{2}) - N_{O}

as well as the NO count (N _NO), which is simply the sum of the number of N atoms and O atoms.

Next, the gas phase enthalpy of formation (ΔH _f(g)) is determined according to eq , using the ab initio total enthalpies, H, from the previous section. Following this, the expected products of the detonation reaction are determined either through the rule-based KJ scheme or our optimization approach and from these we can deduce the number of gas molecules produced (n _gas), the average molecular weight of these gases (m̅_gas) and the total enthalpy of the products. The specific (weight-normalized) enthalpy of the reaction in calories per gram (q _cal) is determined by subtracting the enthalpy of the reactant (H) from the enthalpy of the products and dividing by M, see eq .

In some cases, we normalize descriptors by V in order to transform extensive descriptors into intensive ones. For example dividing M by V (as in eq ), which are both extensive descriptors, yields a new intensive descriptor in the form of a trivially computed density (ρ_triv). We also create a descriptor based on the surface area to volume ratio (SA_vol), as well as the topological polar surface area per unit volume (TPSA_vol). The solvation descriptors, E _GENP, E _CDSA, and E _CDSB are normalized by V per default. Likewise, the A _HB and D _HB descriptors are also normalized since the effect of hydrogen bond accepting or donating groups would generally be expected to depend on system size. We also consider the volume-normalized total energy of the molecule, H _vol, in addition to the raw value, since it is unclear how much an ML model can infer about system size from the total energy. In other cases, for example, electronic properties such as polarizabilities, dipole moments and quadrupole moments, the situation is less clear since vector quantities do not necessarily scale with system size. To this end, we consider both the raw and volume-normalized forms, adding α_vol, μ_vol and Q _vol to our descriptor set. Finally we consider both the raw number of rotors in the molecule as well as the number of rotors per unit volume (N _rot, vol).

3.5. Machine Learning

The scikit-learn package is used within EMProp to construct the machine learning models in this work. We reduce the dimensionality of our descriptor set by first eliminating highly correlated features according to Pearson correlation coefficients, and then perform principal component analysisfurther details are provided in Section . We evaluate linear regression, ridge regression (RR), kernel ridge regression (KRR), and the multilayer perceptron regressor neural network (MLP-NN) models. For densities and detonation velocities, the MLP-NN model is selected, while for solid state enthalpies of formation, RR is used. Model parameters are selected via a grid search cross-validation procedure. For standard model evaluation, predictions are made using 5-fold cross-validation (using always an 80/20 split) with the parameters selected by the grid search procedure and we report averaged performance metrics for the training and test sets from each fold. Where the model is trained and tested on separate data sets, the performance metrics correspond to the values computed from the test data.

3.6. Model Evaluation

The metrics used to evaluate model performance in this work are summarized here. The simplest metric is the mean absolute error (MAE):

MAE = \frac{1}{N} \sum_{i}^{N} | y_{i}^{t r u e} - y_{i}^{p r e d} |

where the absolute error of the predicted $(y_{i}^{pred})$ from the true $(y_{i}^{true})$ values for a given quantity of interest is averaged across all N molecules. We categorize the prediction quality for densities and detonation velocities according to the individual absolute errors of the predictions. To categorize the quality of density predictions, we use ranges of absolute error following the work of Goh et al., who proposed that excellent, informative, and poor predictions are characterized by errors below 0.03 g cm^–3, between 0.03 and 0.05 g cm^–3, and above 0.05 g cm^–3, respectively. For detonation velocity we consider an absolute error of 0.5 km s^–1 to be acceptable.

In addition to the MAE, we compute the root-mean-squared error (RMSE):

RMSE = \sqrt{\frac{1}{N} \sum_{i}^{N} {(y_{i}^{true} - y_{i}^{pred})}^{2}}

which is the quadratic mean of the difference between the predicted and true values. The RMSE is more sensitive to outliers than the MAE. Finally, we compute the coefficient of determination, R ², which represents the proportion of variation in the true values explained from the input features. R ² is expressed through the residual sum of squares and the total sum of squares:

R^{2} = 1 - \frac{\sum {(y_{i}^{true} - y_{i}^{pred})}^{2}}{\sum {(y_{i}^{true} - {y̅}^{true})}^{2}}

where y̅^true is the mean of the true values. The maximum value of R ² is 1, implying that the RMSE vanishes and indicates that the input features are able to perfectly model the true values. As a technical note, it may be pointed out here that the Pearson correlation coefficient (R) would only be defined for a linear regression analysis whereas the coefficient of determination (R ²) can be generalized to general nonlinear regression via eq , which is why we use the latter here.

4. Results and Discussion

We first present results using the basic literature models described in Section as a reference. Following this, we proceed to machine learning techniques discussing, first, feature selection and proceeding to a presentation of our best models. We conclude with an application demonstration studying a set of isomeric cyclobutane nitric esters.

4.1. Basic Models

Our starting point for the prediction of densities is to use eq using the molecular volumes computed using the GEPOL algorithm as detailed in Section . The results are shown for the CCDC and Muravyev (energetic) data sets in Figure .

Trivial densities (eq ) compared to the experimental densities for the 801 molecules extracted from the CCDC (left) and the 132 from the Muravyev data set (right). The red line indicates perfect agreement with the experiment, and the blue line is a regression line for our predicted values. The green and yellow shaded regions encompass absolute errors of ±0.03 g cm^–3 and ±0.05 g cm^–3 respectively, indicating excellent or informative predictions. The percentage of predictions within each shaded region are displayed in the pie charts.

For the CCDC data set, the coefficient of determination, R² is 0.5725 and RMSE/MAE of 0.071/0.055 g cm^–3 are obtained. We find that only 267 (33.3%) predictions are considered to be excellent, while 170 (21.2%) are considered informative. Nearly half of the predictions (364, 45.4%) are considered to be poor. Clearly this trivial model leaves plenty of room for improvement.

Moving from the general CCDC data set to the energetics, Figure (right), further highlights the shortcomings of the trivial model. Over half of the predictions are found to be poor, while only 29 excellent (22.0%) and 27 (20.5%) informative predictions are made out of the 132 molecules in the data set. On inspection of Figure , we recognize that almost all of the poor predictions are underpredictions at higher densities, consistent with previous studies by Rice et al. The enhanced errors for the energetics data set are reflected in the increased error metrics (RMSE = 0.100 g cm^–3, MAE = 0.081 g cm^–3) compared to the CCDC data set.

Having evaluated a trivial approach to computing densities, we now move to assess the performance of eq for computing detonation velocities. We use the experimental charge/loading densities of the energetic molecules at which the detonation velocity data is obtained. The remaining terms in eq relate to the products of the detonation reaction. As a first option we determine these via the Kamlet-Jacobs rules as given in Section . The results are shown in the left-hand side of Figure . At a first glance, the empirical model seems to perform adequately for detonation velocity prediction, providing error metrics within the bounds of ±0.5 km s^–1. However, on closer inspection one finds that detonation velocities above 5 km s^–1 are systematically underpredicted. In total, there are 22 predictions with an absolute error of more than 0.5 km s^–1, and of those, 20 are underpredictions.

Computed detonation velocities (eq ) compared to the experimental detonation velocities for the 132 molecules in the Muravyev data set, using the Kamlet–Jacobs rules (left) and our optimization scheme (right) for detonation products. The red line indicates perfect agreement with experiment and the blue line is a regression line for our predicted values. The green shaded region encompasses an absolute error of ±0.5 km s^–1.

Considering the relatively poor performance of the results obtained in conjunction with the standard Kamlet–Jacobs rules, we have developed an alternative scheme with an intrinsically optimized product distribution, as explained in Section . The results, as shown on the right-hand side of Figure , are clearly improved in all relevant metrics. On inspection of the right-hand side of Figure , we recognize there are significantly fewer underpredictions in the detonation velocity above 6 km s^–1 using the optimization scheme, in contrast to the rule-based scheme in the left-hand side of Figure . In total, there are 18 predictions with an absolute error of more than 0.5 km s^–1. The improvement in the prediction of detonation velocities using eq with the data from the product optimization scheme is reflected by the improved error metrics, with an R ² of 0.8865, an RMSE of 0.374 km s^–1, and an MAE of 0.267 km s^–1. We can attribute this in part to the increased enthalpies of explosion computed with the product optimization scheme compared to the Kamlet–Jacobs’ rules, noting that changes to the distribution of products will have smaller effects on the remaining terms in eq (moles of gaseous products and average gas weight). Furthermore, whereas previous work has denoted the Kamlet–Jacobs rules as attempting to obey the maximal heat release principle, we argue that our product optimization scheme does so per construction.

It is instructive to examine the largest underpredictions between the left and right-hand side of Figure in order to discern the difference, in terms of product distribution and enthalpy of explosion, between the Kamlet–Jacobs rules and our product optimization scheme. The largest outlier using the Kamlet–Jacobs rules is TFTNT (trifluoro-trinitrotoluene, Figure ) appearing at an experimental detonation velocity of 7.5 km s^–1, which is 2.22 km s^–1 higher than the predicted value. According to the Kamlet–Jacobs rules, TFTNT would decompose according to:

C_{7} H_{2} N_{3} O_{6} F_{3} \to 1 H_{2} O + 2.5 {CO}_{2} + 4.5 C (s) + 1.5 N_{2} + 1.5 F_{2}

and produces an enthalpy of explosion of 322.17 cal g^–1. Conversely, our product optimization scheme finds that

C_{7} H_{2} N_{3} O_{6} F_{3} \to 3 {CO}_{2} + 4 C (s) + 1.5 N_{2} + 0.5 F_{2} + 2 HF

resulting in a more than doubled enthalpy of explosion (754.14 cal g^–1) and a vast improvement in the detonation velocity, underpredicted now by only 0.80 km s^–1. The most notable difference in the product distributions is the absence of any water being formed but, instead, the hydrogen atoms are used to produce HF. Furthermore, not producing water produces an extra 0.5 mol of gaseous products (forms less C(s)). Therefore, in addition to the increased enthalpy of explosion, the product set from the optimization scheme also increases n _gas/m̅_gas.

The next largest underprediction (1.21 km s^–1) is for DFTNB (difluoro-trinitrobenzene, see Figure ) possessing an experimental detonation velocity of 7.8 km s^–1. The Kamlet–Jacobs rules result in the detonation reaction:

C_{6} {HN}_{3} O_{6} F_{2} \to 0.5 H_{2} O + 2.75 {CO}_{2} + 3.25 C (s) + 1.5 N_{2} + 1 F_{2}

which has an associated enthalpy of explosion of 689.00 cal g^–1. Our product optimization scheme again suggests the formation of HF rather than water, resulting in an increased number of moles of gaseous products similar to the TFTNT case:

C_{6} {HN}_{3} O_{6} F_{2} \to 3 {CO}_{2} + 3 C (s) + 1.5 N_{2} + 0.5 F_{2} + 1 HF

Again, we find a strongly increased enthalpy of explosion (932.75 cal g^–1). Using the data from the product optimization scheme, the error in the detonation velocity prediction is halved; underpredicted by only 0.60 km s^–1.

As a final example, we compare the differences for DANTNP [bis(amino-nitro-triazolyl)-nitropyrimidine, Figure ], which in contrast to the previous molecules does not contain any halogen atoms. The experimental detonation velocity of DANTNP is 8.2 km s^–1. Using the Kamlet–Jacobs rules, an enthalpy of explosion of 883.98 cal g^–1 is obtained by the reaction:

C_{8} H_{5} N_{1} 3 O_{6} \to 2.5 H_{2} O + 1.75 {CO}_{2} + 6.25 C (s) + 6.5 N_{2}

and the detonation velocity is underpredicted by 1.03 km s^–1 using this data. In contrast, our product optimization approach finds that

C_{8} H_{5} N_{1} 3 O_{6} \to 1.25 {CH}_{4} + 3 {CO}_{2} + 3.75 C (s) + 6.5 N_{2}

produces a slightly larger enthalpy of explosion (951.50 cal g^–1) which results in an improved detonation velocity prediction, underpredicted by only 0.72 km s^–1. While the enthalpy of explosion is only around 70 cal g^–1 larger using the optimization approach compared to the Kamlet–Jacobs rules, we identify that the production of CH₄ rather than H₂O is preferred, and allows the production of an increased amount of gaseous products and less C(s) overall. Therefore, the interplay of all the different terms which comprise ϕ in eq can be identified to have a significant influence on detonation velocity predictions, even in spite of the density generally being considered the most important parameter.

To conclude our discussion of the basic models we show the results of computing gaseous enthalpies of formation against the reference solid state values in Figure . The presented results are DFT-computed values using a modern density functional along with thermostatistical corrections for vibrations and rotations but include no machine learning corrections or other parametrization. We generally find a reasonable correlation between theory and experiment. The mean absolute error between our gas-phase values and the experimental values is 143.9 kJ mol^–1, while the RMSE is 173.5 kJ mol^–1. The coefficient of determination is 0.8944. Surprisingly, the computed values are generally lower than the experimental solid state enthalpy of formation; according to eq , one might expect that our computed values would be greater than the solid state enthalpy of formation by the sublimation enthalpy. It is not immediately clear why this is the case, however we believe one contributing factor is that our “reference” value for the enthalpy of solid carbon is not that of graphite, but is instead obtained from a computation on gaseous C₆₀, as discussed in Section . As a result, the associated total enthalpy of C(s) from eq is larger than it would be for graphite, thus lowering our trivially computed gas phase enthalpy of formation. We will show below, that this contribution along with the sublimation enthalpy are readily accounted for within the ML models.

Gas-phase enthalpies of formation compared to the experimental solid state enthalpies of formation for the 124 molecules in the Muravyev data set. The red solid line indicates perfect agreement with experiment and the blue solid line is a regression line.

4.2. Machine Learning

Noting that there is clear room for improvement for all the above-presented literature approaches, we proceed to an evaluation of machine learning models. We first, briefly discuss our available descriptors and use a dimensionality reduction to obtain a set of linearly independent features. Subsequently we present cross-validated predictions on density, detonation velocity and enthalpy of formation. It will be of particular interest to evaluate whether all three can be modeled using all in silico approaches not requiring experimental inputs (other than the reference data).

4.2.1. Feature Selection

In our attempt to bridge between properties of isolated molecules and macroscopic quantities it is first important to decide which features (i.e., molecular properties) are important. To do so, we compute the mutual Pearson correlation coefficients between all pairs of features as presented in Tables –. We identify highly correlated features as those with a Pearson coefficient of more than 0.7, and where several features are highly correlated we keep only the one which is most correlated to the parameter of interest. For the detonation velocity and enthalpy of formation, all 41 descriptors from Tables – are considered. In addition, the experimental loading density is included for the former. For the density data set, we exclude H, H _vol, m̅_gas, n _gas, q _cal, ΔH _f(g), as these are not expected to have any influence on the density, leading to 34 descriptors. Starting with these sets we are left with 15/22/20 descriptors for ρ/D/ΔH _f(s) as summarized in Table .

4. Descriptor Sets Used for Machine Learning the Three Quantities of Interest after Removing Highly Correlated Descriptors According to Their Pearson Coefficients.

Target	Descriptor set
Density	E _CDSB, E _GENP, H _vib, S _vib, SA_vol, Q _vol, N _Cl, N _F, N _S, N _NO, OB, D _HB, N _rot, vol, ρ_triv, log P
D	E _CDSA, E _CDSB, E _GENP, H _vib, ΔH _f(g), μ, Q _vol, α_vol, N _C, N _Cl N _F, N _N, N _NO, OB, A _HB, D _HB, N _rot, vol, q _cal, TPSA_vol, log P, m̅_gas, ρ_exp
ΔH _f(s)	E _CDSA, E _CDSB, E _GENP, H _vol, ΔH _f(g) μ, α_vol, N _C, N _Cl, N _F, N _H, N _N, N _O, N _rot, vol, A _HB, D _HB, q _cal, TPSA_vol, log P, m̅_gas

Open in a new tab

Viewing Table , we note that, already at this stage, the density and ΔH _f(s), are modeled with only in-silico descriptors. By contrast, we initially use one experimental quantity, the loading density (ρ_exp), for modeling detonation velocities, as is common practice. However, we will later present a strategy of avoiding ρ_exp and, finally, will end up with pure in-silico strategies for all three quantities.

After excluding correlated variables via the correlation coefficients, we use principal component analysis (PCA) to perform further dimensionality reduction. We choose to select principal components which retain 99% of the explained variance. After scaling and performing principal component analysis, we obtain 13 principal components for densities, 17 for detonation velocities and 16 for enthalpies of formation. This will be used for the cross-validated predictions shown below.

4.2.2. Cross-Validated Predictions

Within this work we evaluated the performance of four different regression models: Linear, Ridge, Kernel Ridge, and MLP-NN regression. After evaluating each model according to the metrics in Section , using the principal components from the previous section, the MLP-NN model was selected for densities and detonation velocities, while the ridge regression model was chosen for enthalpies of formation.

Having determined the optimal regression model for each parameter of interest, we now present the results of our predictions using the most successful machine learning model for each parameter of interest, using the principal components obtained from Section . For densities, we first consider the CCDC data set with cross-validated predictions made using the MLP regressor, shown in the left of Figure .

Cross-validated predicted densities using the MLP-NN model compared to the experimental densities for the two data sets. See caption of Figure for details.

We find a dramatic improvement compared to Figure . In particular, there are significantly fewer overpredictions at low density, while in the high density regime the magnitude of the errors for underpredictions are much lower. The improved predictive power is also reflected by the improved performance metrics: the coefficient of determination, R ² is increased from 0.5725 to 0.8577, the RMSE reduces from 0.070 to 0.040 g cm^–3, and the MAE reduces from 0.055 to 0.030 g cm^–3. Using the neural network approach, there are 479 (59.8%) excellent, 165 (20.6%) informative, and 157 (19.6%) poor predictions.

Next we show the results of the MLP-NN model for the energetics data set in the right-hand side of Figure . As with the CCDC data set, the results for the energetics data set are highly encouraging. The coefficient of determination is increased from 0.6855 for the trivial model to 0.8674 for our machine learning model. Moreover, the error metrics are reduced by more than half (RMSE = 0.047 g cm^–3, MAE = 0.036 g cm^–3) compared to Figure (RMSE = 0.100 g cm^–3, MAE = 0.081 g cm^–3). For the energetics data set with the neural network model, there are 70 (53.0%) excellent, 24 (18.2%) informative, and 38 (28.8%) poor predictions. More generally speaking, the amount of underprediction in the higher density regime is significantly reduced when applying our neural network-based approach and making cross-validated predictions.

Finally, we train a model on the CCDC data set and use the Muravyev energetics data set as a test set on which predictions are made. The resulting correlation is shown in Figure . Overall, the agreement is quite good, with an R ² much higher than our trivial model in Figure , and reduced error metrics. In comparison to the cross-validated predictions in Figure , the quality of the predictions in Figure are slightly diminished, with an increase in the RMSE/MAE from 0.047/0.036 g cm^–3 to 0.060/0.048 g cm^–3. The coefficient of determination is also slightly reduced from 0.8674 to 0.7848. With regards to the prediction classifications introduced earlier, 44 predictions are excellent, 33 are informative and 55 are poor. We are pleased that our model trained on nonenergetic materials performs adequately for an energetic test set especially since previous approaches in this regard, found a significant drop off in the predictive ability of their models. But for practical application we note that, obviously, ML predictions on energetic materials will be improved if the training set does indeed contain energetics.

Predicted densities for the 132 molecules in the Muravyev data set using the MLP-NN regressor, trained on the 801 molecules in the CCDC data set (see caption of Figure for details).

Next, we proceed to detonation velocities using, in the first instance the experimental loading density (ρ_exp) as input. The cross-validated predictions of detonation velocities using the MLP-NN regressor are shown in Figure . At a first glance, our machine learning approach exceeds the performance of the empirical models afforded by the Kamlet–Jacobs equation (eq ) shown in Figure . Using the already optimized version as a reference, the RMSE/MAE are further decreased from 0.374/0.286 to 0.285/0.197 km s^–1. The value of R ² is increased from 0.8865 to 0.9334. Only 9 predictions made with the neural network approach have an absolute error of more than 0.5 km s^–1. Of these, the largest absolute error is 1.5 km s^–1 for MTX-1 with an experimental detonation velocity of only 2.856 km s^–1, shown on the very left in Figure . As discussed in Section , MTX-1 represents a somewhat pathological case with quite nonideal explosive behavior. Similar problems were discussed in ref who report an average overestimation of D by 1.45 km s^–1 when modeling MTX-1 for different densities and experimental setups. The absolute errors of the remaining 8 samples are all below 0.9 km s^–1. Out of these, we find 7 overpredictions and, interestingly, only one underprediction.

Cross-validated predictions of detonation velocities using the MLP-NN for the 132 molecules in the energetics data set using ρ_exp in the fit.The red solid line indicates perfect agreement with experiment and the blue solid line is a regression line.

The previous results used the experimental loading density (ρ_exp) as input, which is common practice when fitting detonation velocities considering that it is also the one experimental quantity that goes into the Kamlet-Jacobs equation (eq ). It is the purpose of this work to progress one step further and present an all in silico approach to the prediction of detonation velocities, not requiring any experimental input. This approach is, thus, applicable to completely unseen molecules. When developing such a model, we keep one stumbling block in mind. Detonation experiments are usually not performed at the theoretical maximum density (the crystalline density) but at a lower loading density. Which loading density to use is ultimately an experimental decision, that we cannot predict via our method. To represent this experimental decision, we use the fraction between the crystalline and loading density as an input parameter. When performing predictions on unseen molecules, this fraction also has to be specified. For example, we can specify to compute the detonation velocity for an unseen molecule at 95% of the theoretical maximum density.

Bearing this in mind, we can now move to a presentation of our all in silico results for the VOD, as shown in Figure . We find excellent agreement with almost no deterioration of the fit parameters compared to Figure , which used the experimental loading density in the fit. This highlights how our machine learning approach does indeed combine all the necessary information in this model. Compared to Figure , there is only a slight deterioration in R ² (from 0.9334 to 0.9194) and also the error metrics are only slightly increased. There are now 13 predictions with absolute errors above 0.5 km s^–1. Interestingly, MTX-1 is somewhat improved and overpredicted by only 1.2 km s^–1. The second largest error is for BTATz [3,6-bis(1H–1,2,3,4-tetrazol-5-ylimino)-1,2,4,5-tetrazine], overpredicted by 1.1 km s^–1. All other errors lie within 0.9 km s^–1 with, again, mostly overpredictions. Finally, reviewing Figure , we want to highlight that our all in silico model is still clearly better than using the Kamlet–Jacobs equation (eq ), despite the latter using ρ_exp as experimental input.

Cross-validated predictions of detonation velocities using the MLP-NN for the 132 molecules in the energetics data set using no experimental input data. The red solid line indicates perfect agreement with experiment and the blue solid line is a regression line.

Finally, we turn to our machine learning model for solid state enthalpies of formation, shown in Figure . In comparison to our trivial approach to modeling this property in Figure , we notice that there is significantly less underprediction for the solid state enthalpies of formation and overall a very good correlation. Earlier, we suggested that the error in the trivial model may have arisen from our choice to neglect the sublimation enthalpy from our model, and also as a result of our use of the gas-phase enthalpy of C₆₀ as an approximation to obtain the value for solid carbon. Presently, it appears that we have been able to circumvent these limitations through the use of our machine learning model, as evidenced by the markedly improved predictions. The coefficient of determination, R ² is 0.9457 while the RMSE and MAE are 79.0 and 52.1 kJ mol^–1 respectively; a reduction of more than half compared to the trivial model.

Cross-validated predictions of solid state enthalpies of formation using ridge regression for the 124 molecules in the energetics data set for which reference data is available. The red solid line indicates perfect agreement with experiment and the blue solid line is a regression line.

4.3. Application Demonstration: Cyclobutane Nitric Esters

In order to demonstrate the capabilities of our method, the machine learning model has been applied to a set of isomeric cyclobutane nitric ester derivatives as initially described by Barton et al. The molecular structures of the six molecules are presented in Figure . It was the goal of ref to investigate how much these isomeric structures differ in their relevant properties. Before concluding, we want to briefly investigate our approach on this set of molecules. We do not use any experimental reference data on these molecules and will, now, test our purely in silico approach for predicting their properties.

Depiction of cyclobutane nitric esters used to exemplify our *all in silico* prediction for densities (ρ, g/cm³), detonation velocities (D, km/s), and heats of formation (ΔH _f in kJ/mol). The three presented D values are given, in order, for a loading densities of 100%, 95%, and 80% of the theoretical maximum density.

Our model presents individual predictions on all six molecules. We find density predictions between 1.632 g/cm³ and 1.658 g/cm³. These are similar to the X-ray crystal densities reported in ref only that our numbers show somewhat lower variability. Next, we present detonation velocities determined for 100%, 95%, and 80% of the theoretical maximum density. For results at 100%, we obtain values around 7.5 km/s, again very similar to ref . Using our model, we can also readily compute values at 95% and 80% of the maximum density, highlighting a signficant decrease in detonation velocity. Finally, the heats of formation are presented, all between 502 and 511 kJ/mol, yielding similar values to ref . In summary, we can conclude that our model readily produces very reasonable numbers for completely unseen molecules.

5. Conclusions and Outlook

We have highlighted modeling approaches for properties of energetic materials using extended quantum chemistry data as a starting point. The properties of interest were the crystalline density, the detonation velocity, and the solid state enthalpy of formation. We first evaluated trivial models before showing how one can construct machine learning models using descriptors from our quantum chemistry calculations as well as additional structural descriptors.

Starting with the density, the trivial mass per unit volume approach (eq ) was generally found to provide only a very rough approximation, particularly for molecules with high densities (ρ > 1.7 g cm^–3). This result is unsurprising given that the single-molecule picture neglects any information about the through-space interactions which would be present in the crystalline environment. Subsequent quantum chemistry computations were intended to produce descriptors reflecting the propensity of the molecule to interact with its environment. Indeed, using these feature sets combined with a multilayer perceptron neural network model, we found a significant improvement for the density data sets studied.

Before proceeding to detonation velocities, we have highlighted the drawbacks of rule-based schemes for determining detonation products; mainly that they are not a “one-size-fits-all” solution particularly where they are incompatible with particular molecular compositions. For example, an excess or lack of oxygen atoms in the case of the Kamlet–Jacobs rules, or molecules containing anything other than CHNO atoms. To overcome this problem, we have introduced a product optimization approach that maximizes the heat release during the explosion per construction. This scheme was shown to provide improved results in conjunction with the Kamlet–Jacobs equation while also providing a solid basis for our ML approach allowing to extend it to arbitrary molecular compositions.

Our predicted detonation velocities, using first the experimental loading densities as input, provided excellent correlation with an R ² of 0.9334 and RMSE/MAE of 0.286/0.197 km s^–1. We proceeded to an all in silico model not requiring any external input highlighting that it produced results of almost the same quality as when using experimental loading densities. Before concluding, we illustrated our method on a set of six isomeric cyclobutane nitric esters presenting all in silico predictions on these molecules.

Finally, we studied solid-state enthalpies of formation. The computed gas phase enthalpies already provided a reasonable correlation but, unsurprisingly, also showed a strong systematic shift. Proceeding to the ridge regression model, we found a vastly improved predictive model, with a 5% increase in R ² to 0.9457, and around half the RMSE and nearly three times lower MAE.

In summary, we have highlighted the power of quantum chemistry in connection with ML models to predict energetic materials properties. We have shown a variety of quantum chemistry data that can be successfully used to augment the models to incorporate information about crystal packing and detonation dynamics. More specifically, we have highlighted the power of a new nonempirical product optimization scheme. We believe that these advancements presents an important step in the quest for improved materials predictions. In the future, it will be interesting to extend the presented approach to other detonation parameters as well as bulk materials properties, such as melting points.

Supplementary Material

ct5c00865_si_001.zip^{(218.2KB, zip)}

ct5c00865_si_002.pdf^{(175.3KB, pdf)}

Acknowledgments

This work was supported by the Defence Science and Technology Laboratory (Dstl), Requisition No. RQ00000010147.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.5c00865.

Quantum chemistry and experimental data underlying the presented machine learning models (ZIP)
Further description of the data provided and notes on data curation (PDF)

The authors declare no competing financial interest.

References

Politzer, P. ; Murray, J. S. . Energetic materials: Part 2, Detonation, combustion. In Theoretical and computational chemistry; Elsevier: Amsterdam, 2003, Vol. 13. [Google Scholar]
Michalchuk A. A. L., Hemingway J., Morrison C. A.. Predicting the impact sensitivities of energetic materials through zone-center phonon up-pumping. J. Chem. Phys. 2021;154:064105. doi: 10.1063/5.0036927. [DOI] [PubMed] [Google Scholar]
O’Connor D., Bier I., Hsieh Y.-T., Marom N.. Performance of Dispersion-Inclusive Density Functional Theory Methods for Energetic Materials. J. Chem. Theory Comput. 2022;18:4456–4471. doi: 10.1021/acs.jctc.2c00350. [DOI] [PubMed] [Google Scholar]
Chen F., Wang Y., Song S., Wang K., Zhang Q.. Impact of Positional Isomerism on Melting Point and Stability in New Energetic Melt-Castable Materials. J. Phys. Chem. C. 2023;127:8887–8893. doi: 10.1021/acs.jpcc.3c01554. [DOI] [Google Scholar]
Hunter S., Coster P. L., Davidson A. J., Millar D. I. A., Parker S. F., Marshall W. G., Smith R. I., Morrison C. A., Pulham C. R.. High-Pressure Experimental and DFT-D Structural Studies of the Energetic Material FOX-7. J. Phys. Chem. C. 2015;119:2322–2334. doi: 10.1021/jp5110888. [DOI] [Google Scholar]
Hunter S., Sutinen T., Parker S. F., Morrison C. A., Williamson D. M., Thompson S., Gould P. J., Pulham C. R.. Experimental and DFT-D Studies of the Molecular Organic Energetic Material RDX. J. Phys. Chem. C. 2013;117:8062–8071. doi: 10.1021/jp4004664. [DOI] [Google Scholar]
Michalchuk A. A. L., Trestman M., Rudić S., Portius P., Fincham P. T., Pulham C. R., Morrison C. A.. Predicting the reactivity of energetic materials: An ab initio multi-phonon approach. J. Mater. Chem. A. 2019;7:19539–19553. doi: 10.1039/C9TA06209B. [DOI] [Google Scholar]
Rein J., Meinhardt J. M., Wahlman J. L. H., Sigman M. S., Lin S.. A Physical Organic Approach towards Statistical Modeling of Tetrazole and Azide Decomposition. Angew. Chem., Int. Ed. 2023;62:e202218213. doi: 10.1002/anie.202218213. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kamlet M. J., Jacobs S. J.. Chemistry of Detonations. I. A Simple Method for Calculating Detonation Properties of C-H-N-O Explosives. J. Chem. Phys. 1968;48:23–35. doi: 10.1063/1.1667908. [DOI] [Google Scholar]
Rice B. M., Hare J. J., Byrd E. F.. Accurate predictions of crystal densities using quantum mechanical molecular volumes. J. Phys. Chem. A. 2007;111:10874–10879. doi: 10.1021/jp073117j. [DOI] [PubMed] [Google Scholar]
Politzer P., Martinez J., Murray J. S., Concha M. C., Toro-Labbé A.. An electrostatic interaction correction for improved crystal density prediction. Mol. Phys. 2009;107:2095–2101. doi: 10.1080/00268970903156306. [DOI] [Google Scholar]
Smith D. W.. Additive bond energy scheme for the calculation of enthalpies of formation and bond dissociation energies for alkyl radicals. J. Chem. Soc., Faraday Trans. 1996;92:4415–4417. doi: 10.1039/ft9969204415. [DOI] [Google Scholar]
Mathieu D.. Simple alternative to neural networks for predicting sublimation enthalpies from fragment contributions. Ind. Eng. Chem. Res. 2012;51:2814–2819. doi: 10.1021/ie201995k. [DOI] [Google Scholar]
Suceska M.. Calculation of Detonation Parameters by EXPLO5 Computer Program. Mater. Sci. Forum. 2004;465–466:325–330. doi: 10.4028/www.scientific.net/MSF.465-466.325. [DOI] [Google Scholar]
Muravyev N. V., Wozniak D. R., Piercey D. G.. Progress and performance of energetic materials: Open dataset, tool, and implications for synthesis. J. Mater. Chem. A. 2022;10:11054–11073. doi: 10.1039/D2TA01339H. [DOI] [Google Scholar]
Muthurajan H., How Ghee A.. Software development for the detonation product analysis of high energetic materials - Part I. Cent. Eur. J. Energy Mater. 2008;5:19–35. [Google Scholar]
Sivapirakasam S. P., Kumar N. V., Jeyabalaganesh G., Nagarjuna K.. Simple Model for Predicting the Detonation Velocity of Organic, Inorganic, and Mixed Explosives. Combust., Explos. Shock Waves. 2021;57:726–735. doi: 10.1134/S0010508221060125. [DOI] [Google Scholar]
Fried L. E., Manaa M. R., Pagoria P. F., Simpson R. L.. Design and synthesis of energetic materials. Annu. Rev. Mater. Sci. 2001;31:291–321. doi: 10.1146/annurev.matsci.31.1.291. [DOI] [Google Scholar]
Ammon H. L.. New atom/functional group volume additivity data bases for the calculation of the crystal densities of C-, H-, N-, O-, F-, S-, P-, Cl-, and Br-containing compounds. Struct. Chem. 2001;12:205–212. doi: 10.1023/A:1016607906625. [DOI] [Google Scholar]
Price S. L.. Predicting crystal structures of organic compounds. Chem. Soc. Rev. 2014;43:2098–2111. doi: 10.1039/C3CS60279F. [DOI] [PubMed] [Google Scholar]
Woodley S. M., Day G. M., Catlow R.. Structure prediction of crystals, surfaces and nanoparticles. Philos. Trans. R. Soc., A. 2020;378:20190600. doi: 10.1098/rsta.2019.0600. [DOI] [PubMed] [Google Scholar]
Yamashita T., Kanehira S., Sato N., Kino H., Terayama K., Sawahata H., Sato T., Utsuno F., Tsuda K., Miyake T., Oguchi T.. CrySPY: A crystal structure prediction tool accelerated by machine learning. Sci. Technol. Adv. Mater.: Methods. 2021;1:87–97. doi: 10.1080/27660400.2021.1943171. [DOI] [Google Scholar]
Nikhar R., Szalewicz K.. Reliable crystal structure predictions from first principles. Nat. Commun. 2022;13:3095. doi: 10.1038/s41467-022-30692-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reilly A. M., Cooper R. I., Adjiman C. S., Bhattacharya S., Boese A. D., Brandenburg J. G., Bygrave P. J., Bylsma R., Campbell J. E., Car R.. et al. Report on the sixth blind test of organic crystal structure prediction methods. Acta Crystallogr., Sect. B: Struct. Sci., Cryst. Eng. Mater. 2016;72:439–459. doi: 10.1107/S2052520616007447. [DOI] [PMC free article] [PubMed] [Google Scholar]
Deringer V. L., Caro M. A., Csányi G.. Machine Learning Interatomic Potentials as Emerging Tools for Materials Science. Adv. Mater. 2019;31:1902765. doi: 10.1002/adma.201902765. [DOI] [PubMed] [Google Scholar]
Unke O. T., Chmiela S., Sauceda H. E., Gastegger M., Poltavsky I., Schütt K. T., Tkatchenko A., Müller K. R.. Machine Learning Force Fields. Chem. Rev. 2021;121:10142–10186. doi: 10.1021/acs.chemrev.0c01111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pederson R., Kalita B., Burke K.. Machine learning and density functional theory. Nat. Rev. Phys. 2022;4:357–358. doi: 10.1038/s42254-022-00470-2. [DOI] [Google Scholar]
Sanchez-Lengeling B., Aspuru-Guzik A.. Inverse molecular design using machine learning: Generative models for matter engineering. Science. 2018;361:360–365. doi: 10.1126/science.aat2663. [DOI] [PubMed] [Google Scholar]
Tkatchenko A.. Machine learning for chemical discovery. Nat. Commun. 2020;11:4125. doi: 10.1038/s41467-020-17844-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ruddigkeit L., Van Deursen R., Blum L. C., Reymond J. L.. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 2012;52:2864–2875. doi: 10.1021/ci300415d. [DOI] [PubMed] [Google Scholar]
Nguyen P., Loveland D., Kim J. T., Karande P., Hiszpanski A. M., Han T. Y. J.. Predicting Energetics Materials’ Crystalline Density from Chemical Structure by Machine Learning. J. Chem. Inf. Model. 2021;61:2147–2158. doi: 10.1021/acs.jcim.0c01318. [DOI] [PubMed] [Google Scholar]
Elton D. C., Boukouvalas Z., Butrico M. S., Fuge M. D., Chung P. W.. Applying machine learning techniques to predict the properties of energetic materials. Sci. Rep. 2018;8:9059. doi: 10.1038/s41598-018-27344-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang C., Chen J., Wang R., Zhang M., Zhang C., Liu J.. Density Prediction Models for Energetic Compounds Merely Using Molecular Topology. J. Chem. Inf. Model. 2021;61:2582–2593. doi: 10.1021/acs.jcim.0c01393. [DOI] [PubMed] [Google Scholar]
Mathieu D.. Molecular Energies Derived from Deep Learning: Application to the Prediction of Formation Enthalpies Up to High Energy Compounds. Mol. Inform. 2022;41:2100064. doi: 10.1002/minf.202100064. [DOI] [PubMed] [Google Scholar]
Smith J. S., Nebgen B., Lubbers N., Isayev O., Roitberg A. E.. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 2018;148:241733. doi: 10.1063/1.5023802. [DOI] [PubMed] [Google Scholar]
Smith J. S., Nebgen B. T., Zubatyuk R., Lubbers N., Devereux C., Barros K., Tretiak S., Isayev O., Roitberg A. E.. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 2019;10:2903. doi: 10.1038/s41467-019-10827-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fink T., Bruggesser H., Reymond J. L.. Virtual Exploration of the Small-Molecule Chemical Universe below 160 Da. Angew. Chem., Int. Ed. 2005;44:1504–1508. doi: 10.1002/anie.200462457. [DOI] [PubMed] [Google Scholar]
Fink T., Reymond J. L.. Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery. J. Chem. Inf. Model. 2007;47:342–353. doi: 10.1021/ci600423u. [DOI] [PubMed] [Google Scholar]
Politzer P., Murray J. S.. The Role of Product Composition in Determining Detonation Velocity and Detonation Pressure. Cent. Eur. J. Energy Mater. 2014;11:459–474. [Google Scholar]
Klapötke T. M., Stein M., Stierstorfer J.. Salts of 1H-Tetrazole-Synthesis, Characterization and Properties. Z. Anorg. Allg. Chem. 2008;634:1711–1723. doi: 10.1002/zaac.200800139. [DOI] [Google Scholar]
Jafari M., Ghani K., Keshavarz M. H., Derikvandy F.. Assessing the Detonation Performance of New Tetrazole Base High Energy Density materials. Propellants, Explos., Pyrotech. 2018;43:1236–1244. doi: 10.1002/prep.201800176. [DOI] [Google Scholar]
Kolesov V. I., Kapranov K. O., Tkacheva A. V., Kulagin I. A.. Explosive Characteristics of Tetrazene and MTX-1. Combust., Explos. Shock Waves. 2021;57:350–355. doi: 10.1134/S0010508221030102. [DOI] [Google Scholar]
Zelenov V. P., Baraboshkin N. M., Khakimov D. V., Muravyev N. V., Meerov D. B., Troyan I. A., Pivina T. S., Dzyabchenko A. V., Fedyanin I. V.. Time for quartet: The stable 3: 1 cocrystal formulation of FTDO and BTF-a high-energy-density material. CrystEngComm. 2020;22:4823–4832. doi: 10.1039/D0CE00639D. [DOI] [Google Scholar]
Groom C. R., Bruno I. J., Lightfoot M. P., Ward S. C.. The Cambridge Structural Database. Acta Crystallogr. 2016;72:171–179. doi: 10.1107/S2052520616003954. [DOI] [PMC free article] [PubMed] [Google Scholar]
Neese F., Wennmohs F., Becker U., Riplinger C.. The ORCA quantum chemistry program package. J. Chem. Phys. 2020;152:224108. doi: 10.1063/5.0004608. [DOI] [PubMed] [Google Scholar]
Grimme S., Hansen A., Ehlert S., Mewes J. M.. R2SCAN-3c: A “swiss army knife” composite electronic-structure method. J. Chem. Phys. 2021;154:064103. doi: 10.1063/5.0040021. [DOI] [PubMed] [Google Scholar]
Kruse H., Grimme S.. A geometrical correction for the inter- and intra-molecular basis set superposition error in Hartree-Fock and density functional theory calculations for large systems. J. Chem. Phys. 2012;136:154101. doi: 10.1063/1.3700154. [DOI] [PubMed] [Google Scholar]
Caldeweyher E., Bannwarth C., Grimme S.. Extension of the D3 dispersion coefficient model. J. Chem. Phys. 2017;147:34112. doi: 10.1063/1.4993215. [DOI] [PubMed] [Google Scholar]
Caldeweyher E., Ehlert S., Hansen A., Neugebauer H., Spicher S., Bannwarth C., Grimme S.. A generally applicable atomic-charge dependent London dispersion correction. J. Chem. Phys. 2019;150:154122. doi: 10.1063/1.5090222. [DOI] [PubMed] [Google Scholar]
Grimme S.. Supramolecular Binding Thermodynamics by Dispersion-Corrected Density Functional Theory. Chem. -A Eur. J. 2012;18:9955–9964. doi: 10.1002/chem.201200497. [DOI] [PubMed] [Google Scholar]
Pascual-Ahuir J. L., Silla E.. GEPOL: An improved description of molecular surfaces. I. Building the spherical surface set. J. Comput. Chem. 1990;11:1047–1060. doi: 10.1002/jcc.540110907. [DOI] [Google Scholar]
Barone V., Cossi M.. Quantum calculation of molecular energies and energy gradients in solution by a conductor solvent model. J. Phys. Chem. A. 1998;102:1995–2001. doi: 10.1021/jp9716997. [DOI] [Google Scholar]
Marenich A. V., Cramer C. J., Truhlar D. G.. Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J. Phys. Chem. B. 2009;113:6378–6396. doi: 10.1021/jp810292n. [DOI] [PubMed] [Google Scholar]
Mardirossian N., Head-Gordon M.. ω B97M-V: A combinatorially optimized, range-separated hybrid, meta-GGA density functional with VV10 nonlocal correlation. J. Chem. Phys. 2016;144:214110. doi: 10.1063/1.4952647. [DOI] [PubMed] [Google Scholar]
Weigend F., Ahlrichs R.. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Phys. Chem. Chem. Phys. 2005;7:3297–3305. doi: 10.1039/b508541a. [DOI] [PubMed] [Google Scholar]
Epifanovsky E., Gilbert A. T., Feng X., Lee J., Mao Y., Mardirossian N., Pokhilko P., White A. F., Coons M. P., Dempwolff A. L.. et al. Software for the frontiers of quantum chemistry: An overview of developments in the Q-Chem 5 package. J. Chem. Phys. 2021;155:084801. doi: 10.1063/5.0055522. [DOI] [PMC free article] [PubMed] [Google Scholar]
Virtanen P., Gommers R., Oliphant T. E., Haberland M., Reddy T., Cournapeau D., Burovski E., Peterson P., Weckesser W., Bright J.. et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
O’Boyle N. M., Banck M., James C. A., Morley C., Vandermeersch T., Hutchison G. R.. Open Babel: An Open chemical toolbox. J. Cheminf. 2011;3:33. doi: 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
Daylight Theory: SMARTS - A Language for Describing Molecular Patterns. https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html.
Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V.. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
Barton L. M., Edwards J. T., Johnson E. C., Bukowski E. J., Sausa R. C., Byrd E. F. C., Orlicki J. A., Sabatini J. J., Baran P. S.. Impact of Stereo- and Regiochemistry on Energetic Materials. J. Am. Chem. Soc. 2019;141:12531–12535. doi: 10.1021/jacs.9b06961. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ct5c00865_si_001.zip^{(218.2KB, zip)}

ct5c00865_si_002.pdf^{(175.3KB, pdf)}

[ref1] Politzer, P. ; Murray, J. S. . Energetic materials: Part 2, Detonation, combustion. In Theoretical and computational chemistry; Elsevier: Amsterdam, 2003, Vol. 13. [Google Scholar]

[ref2] Michalchuk A. A. L., Hemingway J., Morrison C. A.. Predicting the impact sensitivities of energetic materials through zone-center phonon up-pumping. J. Chem. Phys. 2021;154:064105. doi: 10.1063/5.0036927. [DOI] [PubMed] [Google Scholar]

[ref3] O’Connor D., Bier I., Hsieh Y.-T., Marom N.. Performance of Dispersion-Inclusive Density Functional Theory Methods for Energetic Materials. J. Chem. Theory Comput. 2022;18:4456–4471. doi: 10.1021/acs.jctc.2c00350. [DOI] [PubMed] [Google Scholar]

[ref4] Chen F., Wang Y., Song S., Wang K., Zhang Q.. Impact of Positional Isomerism on Melting Point and Stability in New Energetic Melt-Castable Materials. J. Phys. Chem. C. 2023;127:8887–8893. doi: 10.1021/acs.jpcc.3c01554. [DOI] [Google Scholar]

[ref5] Hunter S., Coster P. L., Davidson A. J., Millar D. I. A., Parker S. F., Marshall W. G., Smith R. I., Morrison C. A., Pulham C. R.. High-Pressure Experimental and DFT-D Structural Studies of the Energetic Material FOX-7. J. Phys. Chem. C. 2015;119:2322–2334. doi: 10.1021/jp5110888. [DOI] [Google Scholar]

[ref6] Hunter S., Sutinen T., Parker S. F., Morrison C. A., Williamson D. M., Thompson S., Gould P. J., Pulham C. R.. Experimental and DFT-D Studies of the Molecular Organic Energetic Material RDX. J. Phys. Chem. C. 2013;117:8062–8071. doi: 10.1021/jp4004664. [DOI] [Google Scholar]

[ref7] Michalchuk A. A. L., Trestman M., Rudić S., Portius P., Fincham P. T., Pulham C. R., Morrison C. A.. Predicting the reactivity of energetic materials: An ab initio multi-phonon approach. J. Mater. Chem. A. 2019;7:19539–19553. doi: 10.1039/C9TA06209B. [DOI] [Google Scholar]

[ref8] Rein J., Meinhardt J. M., Wahlman J. L. H., Sigman M. S., Lin S.. A Physical Organic Approach towards Statistical Modeling of Tetrazole and Azide Decomposition. Angew. Chem., Int. Ed. 2023;62:e202218213. doi: 10.1002/anie.202218213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] Kamlet M. J., Jacobs S. J.. Chemistry of Detonations. I. A Simple Method for Calculating Detonation Properties of C-H-N-O Explosives. J. Chem. Phys. 1968;48:23–35. doi: 10.1063/1.1667908. [DOI] [Google Scholar]

[ref10] Rice B. M., Hare J. J., Byrd E. F.. Accurate predictions of crystal densities using quantum mechanical molecular volumes. J. Phys. Chem. A. 2007;111:10874–10879. doi: 10.1021/jp073117j. [DOI] [PubMed] [Google Scholar]

[ref11] Politzer P., Martinez J., Murray J. S., Concha M. C., Toro-Labbé A.. An electrostatic interaction correction for improved crystal density prediction. Mol. Phys. 2009;107:2095–2101. doi: 10.1080/00268970903156306. [DOI] [Google Scholar]

[ref12] Smith D. W.. Additive bond energy scheme for the calculation of enthalpies of formation and bond dissociation energies for alkyl radicals. J. Chem. Soc., Faraday Trans. 1996;92:4415–4417. doi: 10.1039/ft9969204415. [DOI] [Google Scholar]

[ref13] Mathieu D.. Simple alternative to neural networks for predicting sublimation enthalpies from fragment contributions. Ind. Eng. Chem. Res. 2012;51:2814–2819. doi: 10.1021/ie201995k. [DOI] [Google Scholar]

[ref14] Suceska M.. Calculation of Detonation Parameters by EXPLO5 Computer Program. Mater. Sci. Forum. 2004;465–466:325–330. doi: 10.4028/www.scientific.net/MSF.465-466.325. [DOI] [Google Scholar]

[ref15] Muravyev N. V., Wozniak D. R., Piercey D. G.. Progress and performance of energetic materials: Open dataset, tool, and implications for synthesis. J. Mater. Chem. A. 2022;10:11054–11073. doi: 10.1039/D2TA01339H. [DOI] [Google Scholar]

[ref16] Muthurajan H., How Ghee A.. Software development for the detonation product analysis of high energetic materials - Part I. Cent. Eur. J. Energy Mater. 2008;5:19–35. [Google Scholar]

[ref17] Sivapirakasam S. P., Kumar N. V., Jeyabalaganesh G., Nagarjuna K.. Simple Model for Predicting the Detonation Velocity of Organic, Inorganic, and Mixed Explosives. Combust., Explos. Shock Waves. 2021;57:726–735. doi: 10.1134/S0010508221060125. [DOI] [Google Scholar]

[ref18] Fried L. E., Manaa M. R., Pagoria P. F., Simpson R. L.. Design and synthesis of energetic materials. Annu. Rev. Mater. Sci. 2001;31:291–321. doi: 10.1146/annurev.matsci.31.1.291. [DOI] [Google Scholar]

[ref19] Ammon H. L.. New atom/functional group volume additivity data bases for the calculation of the crystal densities of C-, H-, N-, O-, F-, S-, P-, Cl-, and Br-containing compounds. Struct. Chem. 2001;12:205–212. doi: 10.1023/A:1016607906625. [DOI] [Google Scholar]

[ref20] Price S. L.. Predicting crystal structures of organic compounds. Chem. Soc. Rev. 2014;43:2098–2111. doi: 10.1039/C3CS60279F. [DOI] [PubMed] [Google Scholar]

[ref21] Woodley S. M., Day G. M., Catlow R.. Structure prediction of crystals, surfaces and nanoparticles. Philos. Trans. R. Soc., A. 2020;378:20190600. doi: 10.1098/rsta.2019.0600. [DOI] [PubMed] [Google Scholar]

[ref22] Yamashita T., Kanehira S., Sato N., Kino H., Terayama K., Sawahata H., Sato T., Utsuno F., Tsuda K., Miyake T., Oguchi T.. CrySPY: A crystal structure prediction tool accelerated by machine learning. Sci. Technol. Adv. Mater.: Methods. 2021;1:87–97. doi: 10.1080/27660400.2021.1943171. [DOI] [Google Scholar]

[ref23] Nikhar R., Szalewicz K.. Reliable crystal structure predictions from first principles. Nat. Commun. 2022;13:3095. doi: 10.1038/s41467-022-30692-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref24] Reilly A. M., Cooper R. I., Adjiman C. S., Bhattacharya S., Boese A. D., Brandenburg J. G., Bygrave P. J., Bylsma R., Campbell J. E., Car R.. et al. Report on the sixth blind test of organic crystal structure prediction methods. Acta Crystallogr., Sect. B: Struct. Sci., Cryst. Eng. Mater. 2016;72:439–459. doi: 10.1107/S2052520616007447. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] Deringer V. L., Caro M. A., Csányi G.. Machine Learning Interatomic Potentials as Emerging Tools for Materials Science. Adv. Mater. 2019;31:1902765. doi: 10.1002/adma.201902765. [DOI] [PubMed] [Google Scholar]

[ref26] Unke O. T., Chmiela S., Sauceda H. E., Gastegger M., Poltavsky I., Schütt K. T., Tkatchenko A., Müller K. R.. Machine Learning Force Fields. Chem. Rev. 2021;121:10142–10186. doi: 10.1021/acs.chemrev.0c01111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref27] Pederson R., Kalita B., Burke K.. Machine learning and density functional theory. Nat. Rev. Phys. 2022;4:357–358. doi: 10.1038/s42254-022-00470-2. [DOI] [Google Scholar]

[ref28] Sanchez-Lengeling B., Aspuru-Guzik A.. Inverse molecular design using machine learning: Generative models for matter engineering. Science. 2018;361:360–365. doi: 10.1126/science.aat2663. [DOI] [PubMed] [Google Scholar]

[ref29] Tkatchenko A.. Machine learning for chemical discovery. Nat. Commun. 2020;11:4125. doi: 10.1038/s41467-020-17844-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] Ruddigkeit L., Van Deursen R., Blum L. C., Reymond J. L.. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 2012;52:2864–2875. doi: 10.1021/ci300415d. [DOI] [PubMed] [Google Scholar]

[ref31] Nguyen P., Loveland D., Kim J. T., Karande P., Hiszpanski A. M., Han T. Y. J.. Predicting Energetics Materials’ Crystalline Density from Chemical Structure by Machine Learning. J. Chem. Inf. Model. 2021;61:2147–2158. doi: 10.1021/acs.jcim.0c01318. [DOI] [PubMed] [Google Scholar]

[ref32] Elton D. C., Boukouvalas Z., Butrico M. S., Fuge M. D., Chung P. W.. Applying machine learning techniques to predict the properties of energetic materials. Sci. Rep. 2018;8:9059. doi: 10.1038/s41598-018-27344-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] Yang C., Chen J., Wang R., Zhang M., Zhang C., Liu J.. Density Prediction Models for Energetic Compounds Merely Using Molecular Topology. J. Chem. Inf. Model. 2021;61:2582–2593. doi: 10.1021/acs.jcim.0c01393. [DOI] [PubMed] [Google Scholar]

[ref34] Mathieu D.. Molecular Energies Derived from Deep Learning: Application to the Prediction of Formation Enthalpies Up to High Energy Compounds. Mol. Inform. 2022;41:2100064. doi: 10.1002/minf.202100064. [DOI] [PubMed] [Google Scholar]

[ref35] Smith J. S., Nebgen B., Lubbers N., Isayev O., Roitberg A. E.. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 2018;148:241733. doi: 10.1063/1.5023802. [DOI] [PubMed] [Google Scholar]

[ref36] Smith J. S., Nebgen B. T., Zubatyuk R., Lubbers N., Devereux C., Barros K., Tretiak S., Isayev O., Roitberg A. E.. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 2019;10:2903. doi: 10.1038/s41467-019-10827-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] Fink T., Bruggesser H., Reymond J. L.. Virtual Exploration of the Small-Molecule Chemical Universe below 160 Da. Angew. Chem., Int. Ed. 2005;44:1504–1508. doi: 10.1002/anie.200462457. [DOI] [PubMed] [Google Scholar]

[ref38] Fink T., Reymond J. L.. Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery. J. Chem. Inf. Model. 2007;47:342–353. doi: 10.1021/ci600423u. [DOI] [PubMed] [Google Scholar]

[ref39] Politzer P., Murray J. S.. The Role of Product Composition in Determining Detonation Velocity and Detonation Pressure. Cent. Eur. J. Energy Mater. 2014;11:459–474. [Google Scholar]

[ref40] Klapötke T. M., Stein M., Stierstorfer J.. Salts of 1H-Tetrazole-Synthesis, Characterization and Properties. Z. Anorg. Allg. Chem. 2008;634:1711–1723. doi: 10.1002/zaac.200800139. [DOI] [Google Scholar]

[ref41] Jafari M., Ghani K., Keshavarz M. H., Derikvandy F.. Assessing the Detonation Performance of New Tetrazole Base High Energy Density materials. Propellants, Explos., Pyrotech. 2018;43:1236–1244. doi: 10.1002/prep.201800176. [DOI] [Google Scholar]

[ref42] Kolesov V. I., Kapranov K. O., Tkacheva A. V., Kulagin I. A.. Explosive Characteristics of Tetrazene and MTX-1. Combust., Explos. Shock Waves. 2021;57:350–355. doi: 10.1134/S0010508221030102. [DOI] [Google Scholar]

[ref43] Zelenov V. P., Baraboshkin N. M., Khakimov D. V., Muravyev N. V., Meerov D. B., Troyan I. A., Pivina T. S., Dzyabchenko A. V., Fedyanin I. V.. Time for quartet: The stable 3: 1 cocrystal formulation of FTDO and BTF-a high-energy-density material. CrystEngComm. 2020;22:4823–4832. doi: 10.1039/D0CE00639D. [DOI] [Google Scholar]

[ref44] Groom C. R., Bruno I. J., Lightfoot M. P., Ward S. C.. The Cambridge Structural Database. Acta Crystallogr. 2016;72:171–179. doi: 10.1107/S2052520616003954. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref45] Neese F., Wennmohs F., Becker U., Riplinger C.. The ORCA quantum chemistry program package. J. Chem. Phys. 2020;152:224108. doi: 10.1063/5.0004608. [DOI] [PubMed] [Google Scholar]

[ref46] Grimme S., Hansen A., Ehlert S., Mewes J. M.. R2SCAN-3c: A “swiss army knife” composite electronic-structure method. J. Chem. Phys. 2021;154:064103. doi: 10.1063/5.0040021. [DOI] [PubMed] [Google Scholar]

[ref47] Kruse H., Grimme S.. A geometrical correction for the inter- and intra-molecular basis set superposition error in Hartree-Fock and density functional theory calculations for large systems. J. Chem. Phys. 2012;136:154101. doi: 10.1063/1.3700154. [DOI] [PubMed] [Google Scholar]

[ref48] Caldeweyher E., Bannwarth C., Grimme S.. Extension of the D3 dispersion coefficient model. J. Chem. Phys. 2017;147:34112. doi: 10.1063/1.4993215. [DOI] [PubMed] [Google Scholar]

[ref49] Caldeweyher E., Ehlert S., Hansen A., Neugebauer H., Spicher S., Bannwarth C., Grimme S.. A generally applicable atomic-charge dependent London dispersion correction. J. Chem. Phys. 2019;150:154122. doi: 10.1063/1.5090222. [DOI] [PubMed] [Google Scholar]

[ref50] Grimme S.. Supramolecular Binding Thermodynamics by Dispersion-Corrected Density Functional Theory. Chem. -A Eur. J. 2012;18:9955–9964. doi: 10.1002/chem.201200497. [DOI] [PubMed] [Google Scholar]

[ref51] Pascual-Ahuir J. L., Silla E.. GEPOL: An improved description of molecular surfaces. I. Building the spherical surface set. J. Comput. Chem. 1990;11:1047–1060. doi: 10.1002/jcc.540110907. [DOI] [Google Scholar]

[ref52] Barone V., Cossi M.. Quantum calculation of molecular energies and energy gradients in solution by a conductor solvent model. J. Phys. Chem. A. 1998;102:1995–2001. doi: 10.1021/jp9716997. [DOI] [Google Scholar]

[ref53] Marenich A. V., Cramer C. J., Truhlar D. G.. Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J. Phys. Chem. B. 2009;113:6378–6396. doi: 10.1021/jp810292n. [DOI] [PubMed] [Google Scholar]

[ref54] Mardirossian N., Head-Gordon M.. ω B97M-V: A combinatorially optimized, range-separated hybrid, meta-GGA density functional with VV10 nonlocal correlation. J. Chem. Phys. 2016;144:214110. doi: 10.1063/1.4952647. [DOI] [PubMed] [Google Scholar]

[ref55] Weigend F., Ahlrichs R.. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Phys. Chem. Chem. Phys. 2005;7:3297–3305. doi: 10.1039/b508541a. [DOI] [PubMed] [Google Scholar]

[ref56] Epifanovsky E., Gilbert A. T., Feng X., Lee J., Mao Y., Mardirossian N., Pokhilko P., White A. F., Coons M. P., Dempwolff A. L.. et al. Software for the frontiers of quantum chemistry: An overview of developments in the Q-Chem 5 package. J. Chem. Phys. 2021;155:084801. doi: 10.1063/5.0055522. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref57] Virtanen P., Gommers R., Oliphant T. E., Haberland M., Reddy T., Cournapeau D., Burovski E., Peterson P., Weckesser W., Bright J.. et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref58] O’Boyle N. M., Banck M., James C. A., Morley C., Vandermeersch T., Hutchison G. R.. Open Babel: An Open chemical toolbox. J. Cheminf. 2011;3:33. doi: 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref59] Daylight Theory: SMARTS - A Language for Describing Molecular Patterns. https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html.

[ref60] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V.. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]

[ref61] Barton L. M., Edwards J. T., Johnson E. C., Bukowski E. J., Sausa R. C., Byrd E. F. C., Orlicki J. A., Sabatini J. J., Baran P. S.. Impact of Stereo- and Regiochemistry on Energetic Materials. J. Am. Chem. Soc. 2019;141:12531–12535. doi: 10.1021/jacs.9b06961. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Machine Learning Densities, Detonation Velocities, and Formation Enthalpies of Energetic Materials Using Quantum Chemistry Descriptors

Patrick Kimber

James Mattock

Sophia Wheeler

John Mullaney

Alison Beardah

Justin Fellows

Kenny Jolley

Felix Plasser

Abstract

1. Introduction

1.

2. Background

2.1. Densities

2.2. Detonation Velocities

2.3. Enthalpy of Formation

3. Methods

3.1. Data Collection

2.

3.2. Quantum Chemistry Computations

1. Descriptors Used in This Work from ORCA DFT (r2SCAN-3c) Calculations .

2. Descriptors Used in This Work from Q-Chem SMD (ωB97M-V/def2-TZVP) Calculations .

3.3. Optimal Product Distributions

3.4. Postprocessing

3. Molecular Descriptors Generated in the Post-Processing Stage.

3.5. Machine Learning

3.6. Model Evaluation

4. Results and Discussion

4.1. Basic Models

3.

4.

5.

4.2. Machine Learning

4.2.1. Feature Selection

4. Descriptor Sets Used for Machine Learning the Three Quantities of Interest after Removing Highly Correlated Descriptors According to Their Pearson Coefficients.

4.2.2. Cross-Validated Predictions

6.

7.

8.

9.

10.

4.3. Application Demonstration: Cyclobutane Nitric Esters

11.

5. Conclusions and Outlook

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

1. Descriptors Used in This Work from ORCA DFT (r²SCAN-3c) Calculations .