Skip to main content
Nature Communications logoLink to Nature Communications
. 2020 Nov 11;11:5713. doi: 10.1038/s41467-020-19497-z

Complex reaction processes in combustion unraveled by neural network-based molecular dynamics simulation

Jinzhe Zeng 1, Liqun Cao 1, Mingyuan Xu 1, Tong Zhu 1,2,, John Z H Zhang 1,2,3,4,
PMCID: PMC7658983  PMID: 33177517

Abstract

Combustion is a complex chemical system which involves thousands of chemical reactions and generates hundreds of molecular species and radicals during the process. In this work, a neural network-based molecular dynamics (MD) simulation is carried out to simulate the benchmark combustion of methane. During MD simulation, detailed reaction processes leading to the creation of specific molecular species including various intermediate radicals and the products are intimately revealed and characterized. Overall, a total of 798 different chemical reactions were recorded and some new chemical reaction pathways were discovered. We believe that the present work heralds the dawn of a new era in which neural network-based reactive MD simulation can be practically applied to simulating important complex reaction systems at ab initio level, which provides atomic-level understanding of chemical reaction processes as well as discovery of new reaction pathways at an unprecedented level of detail beyond what laboratory experiments could accomplish.

Subject terms: Computational chemistry, Molecular dynamics, Reaction mechanisms


Gaining insights into combustion processes is challenging due to the complex reactions involved. The present work proposes a neural network potential model trained to ab initio data that enables to simulate the combustion of methane by predicting reactants, products and reaction intermediates.

Introduction

Ever since learning to use fire, human beings have never stopped studying combustion. With increasingly serious concern on environmental pollution from combustion, understanding and mastering the combustion mechanisms is of great importance. Gaining fundamental insights into combustion processes can help us design more efficient engines and minimize the production of pollutants. A typical combustion may contain hundreds of chemical species and thousands of fundamental chemical reactions. In particular, combustion occurs at extreme physical conditions with high pressures and high temperatures up to several thousand degrees. Also, many elementary reactions in a combustion typically occur on sub picosecond time scale. These extreme physical conditions make it very difficult, if not impossible, to carry out real-time experimental study of combustion. Thus, most experimental investigations of chemical reaction mechanisms focus on individual reactions instead of the complex reaction processes occurring in a combustion. In the past decades, in slico experiments such as reactive molecular dynamics (MD) simulations have shown their values in providing molecular (atomic)-level insights into the mechanism of combustions. In a reactive MD simulation, the reaction condition can be easily controlled in the simulation and some supercritical conditions that are difficult to achieve in the experiment can also be handled. Compared with the traditional theoretical approaches such as transition sate theory and quantum collision theory that focuses on studying a single reaction, reactive MD simulation can construct the entire interwoven reaction network of a combustion system1. The heart of the reactive MD simulation is the potential energy surface (PES), which describes the inter- and intra-molecular interactions for molecules. Currently, there are mainly two classes of methods that can be used to construct the PES of a given molecular system: the quantum mechanics (QM)-based methods and the empirical force fields. Quantum mechanics is undoubtedly more rigorous and accurate, and MD simulations based on it are known as ab initio MD simulation (AIMD)2,3. Although the AIMD method in principle can simulate complex chemical reactions in real time, it is limited to relatively small systems and short simulation time (typically, dozens of picoseconds) due to exorbitant computational costs of on-the-fly ab initio calculation. With the rapid development of computer hardware and algorithms, especially the employment of graphic processing units (GPUs), some AIMD methods have recently begun to handle larger chemical systems4. But so far, it is still impractical to use AIMD to simulate large-scale complex reaction systems such as combustions. Over the past decades, many reactive force fields (or PESs) have been developed and successfully used for various reactive molecular systems512. A comprehensive discussion of these reactive force fields can be found in refs. 13,14. Among these force fields, the empirical ReaxFF was widely used in MD simulation of combustion systems due to its computational efficiency15, but its accuracy and reliability are of significant concern1618. The key points of developing a reaction force field are the choice of the functional form and the parameterization process, which are complicated and depend on human intervention.

Recently, more researchers are switching to seek the help of machine-learning (ML) methods. ML method, especially artificial neural networks (NN), provides the possibility to construct PESs with the accuracy of the QM method but with an efficiency comparable to that of force fields. Neural networks constitute a very flexible and unbiased class of mathematical functions, which in principle is able to approximate any real-valued function to arbitrary accuracy. Since Behler and Parrinello proposed the high-dimensional neural network approach19,20, several methods have been developed to implement this approach and many different kind of NN PESs have been proposed for water, small organic molecules, and metalloid materials2125. For example, the sGDML2628, SchNet29, PhysNet30, and FCHL31 methods. NN potentials have also been employed to study the reaction mechanisms of chemical systems. By combining high-precision NN PESs and quantum collision theory, Zhang and Jiang’s group have studied a series of elementary reactions in the gas phase and on the surface3235. Liu and co-workers developed the LASP program to study the heterogeneous catalysis with NN PESs36 and built stochastic surface walking (SSW)-NN to explore reaction pathways from glucose to 5-hydroxymethylfurfural37. Brickel et al. also studied the nucleophilic substitution reaction [Cl–CH3–Br] in water with NN potential38.

In this report, we present an in silico simulation of methane combustion based on an NN potential derived by training a high-dimensional NN model from ab initio computed energies. To achieve high efficiency and accuracy, the DeePMD model was used3941. This NN PES can accurately predict the energy and atomic forces of reactants, products and reaction intermediates. Based on this model, a 1-ns reactive MD simulation was performed for a combustion system initially containing 100 methane and 200 oxygen molecules with a sub-femtosecond time resolution (Fig. 1). A complete reaction network of the methane combustion can be constructed from the MD trajectory. The simulation not only produced the main reaction pathways that are consistent with the experiment but also provided much more detailed insights about the combustion processes as will be described in the following.

Fig. 1. Real-time dynamics of methane combustion.

Fig. 1

a Snapshots of the partial combustion system extracted from the reactive MD simulation of methane combustion (the time interval is 0.2 ns). The main molecular species of CH4, O2, H2O and CO2 molecules are colored in cyan, red, blue and black, respectively. Other molecular species are colored in white. One can see that the number of reaction products were continuously increasing while reactants were being consumed. b Time dependences of the numbers of main molecular species in real-time MD simulation. These curves are smoothed to make them look better and clearer.

Results

Accuracy of the NN PES

The performance of the NN potential highly depends on the quality of the reference datasets. Although several databases, such as QM742, QM943, ANI-144, and ANI-1x45, are accessible, they mainly include organic molecules and are therefore not suitable for this work. Combustion of methane will generate many molecular fragments and a lot of them are free radicals46. Therefore, we followed a workflow (details are listed in the “Methods” section) to construct the reference datasets for the combustion. Then the DeepPot-SE model47 was used to train the NN PES based on the reference. The predictive power of the NN model is shown in Supplementary Table 1 and Supplementary Fig. 1. It is clear that the DFT energies can be accurately reproduced by the NN model. The mean absolute errors are only 0.04 and 0.14 eV/atom in the training set and the test set, respectively. As for the atomic forces, the predicted values of the NN model are also highly consistent with the calculated results of the DFT (Supplementary Fig. 1). The correlation coefficient is 0.999 and the MAE is 0.12 eV/Å. Considering that there are a large number of atomic and molecular collisions during the combustion process, and some atomic forces can be as high as dozens of eV/Å, the accuracy of the NN model is encouraging. To verify the energy conservation of the NN PES, we performed a reactive MD simulation under the NVE ensemble. The system is a periodic box containing 100 CH4 molecules and 200 O2 molecules (a total of 900 atoms) with a density of 0.25 g/cm3. As shown in Supplementary Fig. 2, the total energy is conserved in MD simulation.

The initial stage of combustion

A 1 ns reactive MD simulation was performed for methane combustion with the NN PES under the NVT ensemble. The system is also a periodic box containing 100 CH4 molecules and 200 O2 molecules (a total of 900 atoms) with a density of 0.25 g/cm3. The MD simulations were run with a time-step of 0.1 fs and the temperature was kept at 3000 K by using the Berendsen thermostat. We chose a relatively high density (and thus high pressure) and high temperature to enhance the collision probability and sampling efficiency, which is a widely used strategy in reactive MD simulations because the time scale of the simulation is much shorter than that of experiments. In fact, experiments usually do not use pure fuel for combustion, but rather mix the fuel into a relatively inert gas for safety. In future work, we will try to combine the NN potential and enhanced sampling algorithms to bring simulated conditions more realistic.

Figure 1b and Supplementary Fig. 3 show the time-dependent progression of the main molecular species during the MD simulation. After 1 ns, about 90 CH4 and 150 O2 are consumed and about 160 H2O, 30 CO, and 50 CO2 are produced. The potential energy of the system during the simulation is shown in Supplementary Fig. 4. Although the system has not reached equilibrium, the important ignition process has already done, which includes much richer reaction information. In order to describe the complicated reaction network in more detail, we divided the combustion process into three stages, namely the initial stage of the combustion, the production of intermediate species of formaldehyde and formyl radical, and the production of CO and CO2.

The reaction network in the initial stage of the combustion is shown in Fig. 2a. The combustion of methane started with the abstraction of its hydrogen atom by O2 to generate two radicals ·CH3 and HOO· (R3). As is seen from Fig. 2b, this process started at about 32 ps and took about 0.2 ps to finish. During the simulation, other radicals such as ·OH, ·H, and HOO· also abstracted hydrogen atom from CH4 to generate ·CH3 radical. Among them, the ·OH radical is the main species who complete this work and generates water molecules (R1). The atomization of methane into ·H and ·CH3 was also observed.

Fig. 2. The initial stage of combustion.

Fig. 2

a Main reaction pathways in the initial stage of the combustion. b A real-time trajectory showing the reaction process of hydrogen abstraction from methane by O2. Atoms in cyan, red and gray colors are carbon, oxygen and hydrogen, respectively. c A real-time trajectory showing the reaction process leading to the creation of methanol. Definition of colored atoms is the same as in (b).

Many ·CH3 radicals interact with the ·OH radicals to form methanol (R6) molecules. According to Fig. 2c, this process was also very quick. Some ·CH3 interacted with O2 and HOO· to form methyldioxidanyl (CH3OO·, R4) and methyl-hydroperoxide (CH3OOH, R5). Radicals such as ·OH can also abstract H atoms from ·CH3 and produce :CH2. Methanol can further react with ·OH and ·H to generate methoxy radicals (CH3O·, R10, R11), H2O and H2. It can also react with ·H to generate ·CH2OH and H2 (R12). The CH3O· can also be produced by the interaction between CH3OO· or CH3OOH with ·H (R8 and R9).

Production of formaldehyde and formyl radicals

Most methoxy radicals generated from the last step were converted to formaldehyde mainly through two reaction pathways (Fig. 3a). The first one is for methoxy radical to interact with ·OH to form formaldehyde and H2O (R16). As shown in Fig. 3b, this process took about 0.3 ps. The other pathway is for methoxy radical to interact with ·H and generate formaldehyde and H2 (R17). The ·CH2OH radicals can also convert to formaldehyde by losing the hydrogen atom on its hydroxyl group (R14 and R15). If it loses one H atom on the methylene group, it can generate :CHOH radicals (R13). In addition, the :CH2 radicals can interact with ·OH and form formaldehyde and the methylidyne radical (R18 and R19).

Fig. 3. Production of formaldehyde and formyl radicals.

Fig. 3

a The main reaction pathways for the formation of formaldehyde and formyl radicals. b The real-time trajectory of the reaction CH3O· + ·OH → CH2O + H2O. Atoms in cyan, red and gray colors are carbon, oxygen and hydrogen, respectively. c The real-time trajectory of the reaction CH2O· + ·OH → ·CHO + H2O. Definition of colored atoms is the same as in (b).

The formaldehydes were further converted into the formyl (·CHO) radicals. The main reaction pathways are hydrogen abstraction by ·O and ·OH. Figure 3c shows the trajectory of the reaction CH2O + ·OH → ·CHO + H2O. An ·OH radical approaches the rotating formaldehyde molecule and snatches an H atom to form a water molecule; the whole process takes about 0.4 ps. In addition, other species such as ·H, O2, HOO·, and ·CH3 also abstracted the hydrogen atom from formaldehyde to form formyl radicals. The R20 and R23 are two reactions that form formyl radicals without the participation of formaldehyde.

Production of CO and CO2

Formyl radicals can convert to CO by losing hydrogen in two ways (Fig. 4a). Firstly, it can lose an H atom directly (R25). Figure 4b shows a real-time trajectory of this process. A formyl radical lost its H atom at about 405.79 ps, but this reaction was quickly reversed and the formyl radical was re-formed. After another 0.4 ps the reaction took place again to form CO. Secondly, ·OH can also abstract the H atom from the formyl radical and generate H2O and CO (R26).

Fig. 4. Production of CO and CO2.

Fig. 4

a Main reaction pathways for the formation of CO and CO2. b The real-time trajectory of the reaction ·CHO → CO + ·H. Atoms in cyan, red and gray colors are carbon, oxygen and hydrogen, respectively. c The real-time trajectory of the reaction CO + ·OH → CO2 + ·H. Definition of colored atoms is the same as in (b).

The formyl radical can combine with the ·OH radical to form formic acid (R24), which can further lose its H atom to form ·COOH (R27) or HCOO· (R30). These two species can convert to CO2 through the reaction with ·OH or ·H (R29 and R31). The ·COOH radical can also interact with ·H and generate CO and H2O (R28). Figure 4c shows the trajectory of reaction CO + ·OH → CO2 + ·H (R32). At 815.32 ps, an ·OH radical started to approach a CO molecule, and at 815.38 ps, an intermediate COOH was formed. The COOH should be relatively inactive, it stably existed for about 0.1 ps, and finally lost an H atom and became CO2.

Further analysis found that the above-mentioned 32 reactions have all been found by experiments, and the reaction networks constructed by them are also highly consistent with the main reaction networks found experimentally48,49. We totally detected 505 molecular species and 798 reactions from the trajectory. Species such as ethane, ethylene, and acetylene can also be found in the experimental database. In all, 130 of the 798 reactions extracted from the MD trajectory were included in the widely accepted GRI_Mech experimental mechanism library48. Some experimentally observed reactions were not observed in our simulation, mostly likely because the present simulation was performed at relatively high temperature.

In fact, discovering new reactions is an important advantage of the present approach. For methane oxidation, a system that has been extensively studied by experiments, NN-based reactive MD can still discover hundreds of chemical reactions that have not been experimentally reported. This demonstrates that reactive MD can be a powerful tool to study combustion reactions. Interestingly, we found a cyclopropene molecule in the trajectory, which has not been reported to our knowledge. As shown in Supplementary Fig. 5, at 634.09 ps, a CO molecule collided with a ·CH3 radical and joined together. Then a CH2CO molecule was formed through hydrogen loss. The CH2CO was stable for about 200 ps and then combined with another ·CH3 radical. Subsequent hydrogen loss led to the formation of a cycloprop-2-en-1-one molecule at 828.65 ps. After another 60 ps, the third ·CH3 attacked the cycloprop-2-en-1-one molecule and kicked out the CO group to form the CH3CCH2 molecule at 889.50 ps. Through further internal reaction and hydrogen loss, it finally formed a cyclopropene molecule at 891.16 ps and remained stable throughout the rest of the simulation. The entire process took about 260 ps to complete. While it might be possible that finding cyclopropene in our simulation is a coincidence or driven by the relatively high temperature, it still illustrates the ability of reactive MD simulation to discover new molecules and new reactions.

Discussion

Accurate in silico MD simulation of combustion or other complex chemical reactions is one of the ultimate goals of computational chemistry. In this work, an artificial neural network potential model trained to ab initio data describes complex chemical reactions in methane combustion. This NN potential model is orders of magnitude faster than the conventional DFT calculation. Benefit from the high efficiency of the NN model and GPU acceleration, nanosecond-sale MD simulations for a chemical system containing 900 atoms was achieved in about 4 days or so on an NVIDIA Tesla P100 card. Detailed reaction mechanisms were extracted from the MD trajectory and the detected molecular species and reaction networks are in excellent agreement with experimental observation. In addition, many new reactions were found that were not included in the experimental database. Compared to laboratory experiments, in silico simulations can be performed under more extreme conditions, and any specific reaction of interest can be easily detected and tracked. In addition, MD simulation can achieve ultra-high time resolution. The time-step used in this work is 0.1 fs. With the improvement of algorithms and hardware, even resolutions in smaller time scale can be achieved.

Compared with the traditional prior knowledge-based theoretical approach, reactive MD simulation can explore complex reaction networks and discover new reactions and species without any prior knowledge of reactions. Actually, complex reactions cannot be well understood without considering the kinetics of the reaction network it belongs to. Since reactive MD simulation tracks all chemical reactions in real time, one can even deduce the rate constants for individual reactions from a single MD trajectory by statistical analysis. We extracted the ten most statistically significant reactions from the trajectory and calculated their rate constants based on the algorithms developed in previous studies50,51. As shown in Supplementary Table 2, most of the rate constants agree well with the GRI_Mech data48. The main source of error might come from the uncertainties of parameters in the Arrhenius formula and the completeness of sampling. Ideally, one should run many trajectories with different initial conditions to obtain truly statistically accurate results. However, although these rates may not be accurate enough to be used directly in kinetic modeling, they can be effective in contributing to a comprehensive understanding of the combustion reaction.

A practical issue to be pointed out is that although some algorithms were used in this study to minimize the size of the reference dataset, there are still 578,731 structures in the reference set. Although the DFT calculation is very efficient, such a large reference set is difficult to perform high-level post-Hartree−Fock calculations. In order to further minimize the size of the reference set while ensuring its completeness, new algorithms need to be developed to further enhance the efficiency of this approach. Recently, Zhang et al. developed the DP-GEN52 (Deep potential Generator) software platform, which can automatically construct the reference dataset and train the NN model. The concurrent learning algorithm employed by this platform can make the redundancy of the reference set as small as possible. We are trying to integrate the algorithms developed in this work into the DP-GEN platform.

In addition, it is worth to point that while combustion is usually thought to be dominated by free radical reactions, recent studies have begun to examine the role of electronically excited state species in combustion. For example, the additional introduction of plasma was found to be effective in promoting combustion in experiments53. However, MD simulations involving excited states are highly nontrivial, and there are large uncertainties in ab initio quantum chemistry computation for treating excited states of large systems. Based on sophisticated empirical or machine-learning PESs, several recent works have achieved the excited-state MD simulation for model systems5462. For example, the O+O recombination reaction to form the ground and excited-state singlet O2 molecules on amorphous solid water60. Such strategy will be considered in our future studies.

Despite further improvement is needed, the current report heralds the dawn of a new era in which neural network-based reactive MD simulation can be practically applied to simulating complex reaction systems at the ab initio level, which provides atomic-level understanding of every reaction process at unprecedented level of details beyond what laboratory experiment can accomplish.

Methods

Reference dataset

In this study, a workflow was developed for making reference datasets (Fig. 5). The details of each module in the workflow are given below.

Fig. 5. The workflow of reference dataset construction.

Fig. 5

The process and steps used in this study to generate the reference dataset needed for neural network training to generate the potential energy for MD simulation.

To increase the efficiency of dataset construction, reactive MD simulation with ReaxFF was used to sample an initial dataset. A model combustion system containing a lot of CH4 and H2 molecules was built by using the Amorphous Cell module in the Material Studio63 software package. Then the LAMMPS64 program was used to perform the MD simulation. The NVT ensemble was used and the temperature was set to 3000 K with the Berendsen thermostat. The ReaxFF parameter of Chenoweth et al. (CHO-2008 parameter set)65 was employed. The Open Babel software66 and the Depth-First Search algorithm67 were used to detect species in every snapshot of the trajectory. Then, for each atom in each snapshot, we build a molecular cluster that contains this atom and species that within a specified cutoff centered on it. In this work, the cutoff was set to 5 Å.

The initial dataset contains about 22.5 million structures, which is too large to perform QM calculations for every molecular cluster it contains. Therefore, it is necessary to resample it to remove redundant structures while ensuring its completeness. To this end, we first classified the initial dataset into sub-datasets based on the chemical bond information of the central atom. For example, the central H atom can be classified into two different types: a single H atom (H0) and an H atom formed a single chemical bond with another atom (H1).

Further treatment is still needed for large sub-datasets. For a given large sub-dataset, we first expressed each molecular cluster it contains as a Coulomb matrix68:

Cij=12Zi2.4,i=jZiZjRiRj,ij, 1

where Zi and Zj are nuclear charges of atom i and j, Ri and Rj are their Cartesian coordinates. The minimum image convention69 was used to consider the periodic boundary condition. “Invisible atoms” were introduced to fix the dimension of the Coulomb matrix. These invisible atoms do not influence the physics of the molecule of interest and make the total number of atoms in the molecule sum to a constant. To lower the dimension of the dataset and keep as much structural information as possible, the Coulomb matrix was further represented by the eigen-spectrum, which is obtained by solving the eigenvalue problem Cv=λv under the constraint λiλi+1. The clustering algorithm Mini Batch KMeans70 was then used to cluster the given sub-datasets into smaller clusters according to the eigen-spectrum. Then we randomly selected 10,000 structures from each cluster (If the cluster contains no more than 10,000 structures, then all of them were selected).

Large amplitude collisions and reactions in the combustion can produce a lot of unpredictable species and intermediates. To ensure the completeness of the reference dataset, an active learning approach71 was used. Four different NN PES models were trained based on the dataset from the last step. Then several short MD simulations were performed based on these NN models. During the simulation, the atomic forces are evaluated by these four NN PES models simultaneously. For a specific atom, if the predicted forces by these four models are consistent with each other, then the molecular cluster that centered on this atom should be found in the dataset. On the contrary, if the results of these four models are inconsistent with each other and the error between them is in a specific range (0.5 eV/Å < error < 1.0 eV/Å in this work), the corresponding molecular cluster will be added into the dataset. The update of the dataset will be continued until the predictions of the four models are always consistent.

QM calculation

The potential energy and atomic forces for every structure in the final dataset were calculated by Gaussian 1672 software at the MN15/6-31G** level. The MN15 functional was employed because it has broad accuracy for multi-reference and single-reference systems73. To consider the spin polarization effect, the initial wave function of a given structure is obtained by the combination of the wave functions of individual molecular species forming the structure, while the wave function of each molecular species was calculated based on its own charge and spin.

Training of the NN PES

The scheme of the NN model is shown in Fig. 6. The total energy E of a given structure is decomposed into a sum of atomic energy contributions19,74, i.e., E=iEi, where i is the index of the atom. Each atomic energy is fully determined by the position of the ith atom and its near neighbors. To guarantee the translational, rotational, and permutational symmetries lying in the PES, the Cartesian coordinates of atomics are mapped to specific mathematical formulas called “descriptors” of the atomic chemical environment.

Fig. 6. The neural network model.

Fig. 6

The neural network model that generates the potential energy surface for MD simulation.

The DeepPot-SE (Deep Potential-Smooth Edition) model47 was used to train the NN potential by the DeePMD-kit program74. Details of this method can be found in ref. 67. The model includes two networks: the embedding network and the fitting network. Both networks use the ResNet architecture75. The size of the embedding network was set to (25, 50, 100) and the size of the embedding matrix was set to 12. The size of the fitting network is set to (240, 240, 240). The cutoff radius was set to 6.0 Å and the descriptors decay smoothly from 1.0 to 6.0 Å. The initial learning rate was set to 0.0005 and it will decay every 20,000 steps. The loss is defined by

L=peNΔE2+pf3NiΔFi2, 2

where ΔE and ΔFi are root mean square errors in energy and force. The prefactor pe is set to 0.2 eV−2 and the pf decays from 1000 Å2 eV−2 to 1 Å2 eV−2.

Supplementary information

Supplementary Information (597.4KB, pdf)
Peer Review File (478.5KB, pdf)

Acknowledgements

The authors thank Dr. Linfeng Zhang and Dr. Han Wang for their discussion and help in using DeepPot-SE and DeePMD-kit. T.Z. would also like to thank Prof. Donghui Zhang for his valuable suggestions in this project. This work was supported by the National Key R&D Program of China (grant no. 2016YFA0501700), the National Natural Science Foundation of China (grant nos. 91641116, 91753103, and 21933010), and the Innovation Program of Shanghai Municipal Education Commission (201701070005E00020). J. Zeng was partially supported by the National Innovation and Entrepreneurship Training Program for Undergraduate (201910269080). We also thank the ECNU Multifunctional Platform for Innovation (No. 001) for providing supercomputer time.

Source data

Source Data (4.4MB, xlsx)

Author contributions

J.Z. trained the neural network potential and performed most of the QM calculations. L.C. and M.X. analyzed the trajectory and performed part of the QM calculation. T.Z. and J.Z.H.Z. conceived the project and wrote the manuscript with input from all authors.

Data availability

The datasets (structures, potential energies and atomic forces of molecular species) generated during the current study are available at https://github.com/tongzhugroup/NNREAX, 10.6084/m9.figshare.12973055. Source data are provided with this paper.

Code availability

The codes used to generate the datasets in the current study are available at https://github.com/tongzhugroup/mddatasetbuilder, 10.5281/zenodo.4035925.

Competing interests

The authors declare no competing interests.

Footnotes

Peer review information Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Tong Zhu, Email: tzhu@lps.ecnu.edu.cn.

John Z. H. Zhang, Email: john.zhang@nyu.edu

Supplementary information

Supplementary information is available for this paper at 10.1038/s41467-020-19497-z.

References

  • 1.Martinez TJ. Ab initio reactive computer aided molecular design. Acc. Chem. Res. 2017;50:652–656. doi: 10.1021/acs.accounts.7b00010. [DOI] [PubMed] [Google Scholar]
  • 2.Car R, Parrinello M. Unified approach for molecular-dynamics and density-functional theory. Phys. Rev. Lett. 1985;55:2471–2474. doi: 10.1103/PhysRevLett.55.2471. [DOI] [PubMed] [Google Scholar]
  • 3.Tuckerman ME. Ab initiomolecular dynamics: basic concepts, current trends and novel applications. J. Phys. Condens. Matter. 2002;14:R1297–R1355. doi: 10.1088/0953-8984/14/50/202. [DOI] [Google Scholar]
  • 4.Wang L-P, et al. Discovering chemistry with an ab initio nanoreactor. Nat. Chem. 2014;6:1044. doi: 10.1038/nchem.2099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Van Duin AC, Dasgupta S, Lorant F, Goddard WA. ReaxFF: a reactive force field for hydrocarbons. J. Phys. Chem. A. 2001;105:9396–9409. doi: 10.1021/jp004368u. [DOI] [Google Scholar]
  • 6.Brenner DW, et al. A second-generation reactive empirical bond order (REBO) potential energy expression for hydrocarbons. J. Phys. Condens. Matter. 2002;14:783. doi: 10.1088/0953-8984/14/4/312. [DOI] [Google Scholar]
  • 7.Nouranian S, Tschopp MA, Gwaltney SR, Baskes MI, Horstemeyer MF. An interatomic potential for saturated hydrocarbons based on the modified embedded-atom method. Phys. Chem. Chem. Phys. 2014;16:6233–6249. doi: 10.1039/C4CP00027G. [DOI] [PubMed] [Google Scholar]
  • 8.Qu C, Yu Q, Bowman JM. Permutationally invariant potential energy surfaces. Annu. Rev. Phys. Chem. 2018;69:151–175. doi: 10.1146/annurev-physchem-050317-021139. [DOI] [PubMed] [Google Scholar]
  • 9.Li J, Guo H. Permutationally invariant fitting of intermolecular potential energy surfaces: a case study of the Ne-C2H2 system. J. Chem. Phys. 2015;143:214304. doi: 10.1063/1.4936660. [DOI] [PubMed] [Google Scholar]
  • 10.Braams BJ, Bowman JM. Permutationally invariant potential energy surfaces in high dimensionality. Int. Rev. Phys. Chem. 2009;28:577–606. doi: 10.1080/01442350903234923. [DOI] [PubMed] [Google Scholar]
  • 11.Nagy T, Yosa Reyes J, Meuwly M. Multisurface adiabatic reactive molecular dynamics. J. Chem. Theory Comput. 2014;10:1366–1375. doi: 10.1021/ct400953f. [DOI] [PubMed] [Google Scholar]
  • 12.Warshel, A. & Florián, J. in Encyclopedia of Computational Chemistry (John Wiley and Sons, 2002).
  • 13.Meuwly M. Reactive molecular dynamics: from small molecules to proteins. Wires Comput. Mol. Sci. 2019;9:e1386. doi: 10.1002/wcms.1386. [DOI] [Google Scholar]
  • 14.Koner D, Salehi SM, Mondal P, Meuwly M. Non-conventional force fields for applications in spectroscopy and chemical reaction dynamics. J. Chem. Phys. 2020;153:010901. doi: 10.1063/5.0009628. [DOI] [PubMed] [Google Scholar]
  • 15.Zheng M, et al. Pyrolysis of liulin coal simulated by GPU-based ReaxFF MD with cheminformatics analysis. Energy Fuels. 2014;28:522–534. doi: 10.1021/ef402140n. [DOI] [Google Scholar]
  • 16.Wang E, Ding J, Qu Z, Han K. Development of a reactive force field for hydrocarbons and application to iso-octane thermal decomposition. Energy Fuels. 2017;32:901–907. doi: 10.1021/acs.energyfuels.7b03452. [DOI] [Google Scholar]
  • 17.Cheng T, Jaramillo-Botero A, Goddard WA, Sun H. Adaptive accelerated ReaxFF reactive dynamics with validation from simulating hydrogen combustion. J. Am. Chem. Soc. 2014;136:9434–9442. doi: 10.1021/ja5037258. [DOI] [PubMed] [Google Scholar]
  • 18.Bertels LW, Newcomb LB, Alaghemandi M, Green JR, Head-Gordon M. Benchmarking the performance of the ReaxFF reactive force field on hydrogen combustion systems. J. Phys. Chem. A. 2020;124:5631–5645. doi: 10.1021/acs.jpca.0c02734. [DOI] [PubMed] [Google Scholar]
  • 19.Behler J, Parrinello M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 2007;98:146401. doi: 10.1103/PhysRevLett.98.146401. [DOI] [PubMed] [Google Scholar]
  • 20.Behler J. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem. Int. 2017;56:12828–12840. doi: 10.1002/anie.201703114. [DOI] [PubMed] [Google Scholar]
  • 21.Smith JS, Isayev O, Roitberg AE. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017;8:3192–3203. doi: 10.1039/C6SC05720A. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Yao K, Herr JE, Toth DW, Mckintyre R, Parkhill J. The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics. Chem. Sci. 2018;9:2261–2269. doi: 10.1039/C7SC04934J. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lee K, Yoo D, Jeong W, Han S. SIMPLE-NN: an efficient package for training and executing neural-network interatomic potentials. Comput. Phys. Commun. 2019;242:95–103. doi: 10.1016/j.cpc.2019.04.014. [DOI] [Google Scholar]
  • 24.Chen X, Jørgensen MS, Li J, Hammer B. Atomic energies from a convolutional neural network. J. Chem. Theory Comput. 2018;14:3933–3942. doi: 10.1021/acs.jctc.8b00149. [DOI] [PubMed] [Google Scholar]
  • 25.Zhang Y, Hu C, Jiang B. Embedded atom neural network potentials: efficient and accurate machine learning with a physically inspired representation. J. Phys. Chem. Lett. 2019;10:4962–4967. doi: 10.1021/acs.jpclett.9b02037. [DOI] [PubMed] [Google Scholar]
  • 26.Chmiela S, et al. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 2017;3:e1603015. doi: 10.1126/sciadv.1603015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Schutt KT, Arbabzadah F, Chmiela S, Muller KR, Tkatchenko A. Quantum-chemical insights from deep tensor neural networks. Nat. Commun. 2017;8:13890. doi: 10.1038/ncomms13890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sauceda HE, Chmiela S, Poltavsky I, Muller KR, Tkatchenko A. Molecular force fields with gradient-domain machine learning: construction and application to dynamics of small molecules with coupled cluster forces. J. Chem. Phys. 2019;150:114102. doi: 10.1063/1.5078687. [DOI] [PubMed] [Google Scholar]
  • 29.Schütt KT, Sauceda HE, Kindermans P-J, Tkatchenko A, Müller K-R. SchNet—a deep learning architecture for molecules and materials. J. Chem. Phys. 2018;148:241722. doi: 10.1063/1.5019779. [DOI] [PubMed] [Google Scholar]
  • 30.Unke OT, Meuwly M. PhysNet: a neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 2019;15:3678–3693. doi: 10.1021/acs.jctc.9b00181. [DOI] [PubMed] [Google Scholar]
  • 31.Christensen AS, Bratholm LA, Faber FA, Anatole von Lilienfeld O. FCHL revisited: faster and more accurate quantum machine learning. J. Chem. Phys. 2020;152:044107. doi: 10.1063/1.5126701. [DOI] [PubMed] [Google Scholar]
  • 32.Lu X, Meng Q, Wang X, Fu B, Zhang DH. Rate coefficients of the H+ H2O2→ H2+ HO2 reaction on an accurate fundamental invariant-neural network potential energy surface. J. Chem. Phys. 2018;149:174303. doi: 10.1063/1.5063613. [DOI] [PubMed] [Google Scholar]
  • 33.Yin Z, Guan Y, Fu B, Zhang DH. Two-state diabatic potential energy surfaces of ClH 2 based on nonadiabatic couplings with neural networks. Phys. Chem. Chem. Phys. 2019;21:20372–20383. doi: 10.1039/C9CP03592C. [DOI] [PubMed] [Google Scholar]
  • 34.Zhang Y, Zhou X, Jiang B. Bridging the gap between direct dynamics and globally accurate reactive potential energy surfaces using neural networks. J. Phys. Chem. Lett. 2019;10:1185–1191. doi: 10.1021/acs.jpclett.9b00085. [DOI] [PubMed] [Google Scholar]
  • 35.Chen J, Xu X, Xu X, Zhang DH. Communication: An accurate global potential energy surface for the OH plus CO -> H + CO2 reaction using neural networks. J. Chem. Phys. 2013;138:221104. doi: 10.1063/1.4811109. [DOI] [PubMed] [Google Scholar]
  • 36.Huang SD, Shang C, Kang PL, Zhang XJ, Liu ZP. LASP: fast global potential energy surface exploration. Wiley Interdisci. Rev. Comput. Mol. 2019;9:e1415. [Google Scholar]
  • 37.Kang PL, Shang C, Liuo ZP. Glucose to 5-hydroxymethylfurfural: origin of site-selectivity resolved by machine learning based reaction sampling. J. Am. Chem. Soc. 2019;141:20525–20536. doi: 10.1021/jacs.9b11535. [DOI] [PubMed] [Google Scholar]
  • 38.Brickel S, Das AK, Unke OT, Turan HT, Meuwly M. Reactive molecular dynamics for the [Cl–CH3–Br]− reaction in the gas phase and in solution: a comparative study using empirical and neural network force fields. Electron. Struct. 2019;1:024002. doi: 10.1088/2516-1075/ab1edb. [DOI] [Google Scholar]
  • 39.Zhang L, Han J, Wang H, Car R, Weinan E. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 2018;120:143001. doi: 10.1103/PhysRevLett.120.143001. [DOI] [PubMed] [Google Scholar]
  • 40.Han JQ, Zhang LF, Car R, Weinan E. Deep potential: a general representation of a many-body potential energy surface. Commun. Comput. Phys. 2018;23:629–639. doi: 10.4208/cicp.OA-2017-0213. [DOI] [Google Scholar]
  • 41.Jia, W. et al. Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. Preprint at https://arxiv.org/abs/2005.00223 (2020).
  • 42.Blum LC, Reymond J-L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J. Am. Chem. Soc. 2009;131:8732–8733. doi: 10.1021/ja902302h. [DOI] [PubMed] [Google Scholar]
  • 43.Ruddigkeit L, Van Deursen R, Blum LC, Reymond J-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 2012;52:2864–2875. doi: 10.1021/ci300415d. [DOI] [PubMed] [Google Scholar]
  • 44.Smith JS, Isayev O, Roitberg AE. ANI-1, a data set of 20 million calculated off-equilibrium conformations for organic molecules. Sci. Data. 2017;4:170193. doi: 10.1038/sdata.2017.193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Smith JS, et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data. 2020;7:134. doi: 10.1038/s41597-020-0473-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.He Z, Li X-B, Liu L-M, Zhu W. The intrinsic mechanism of methane oxidation under explosion condition: a combined ReaxFF and DFT study. Fuel. 2014;124:85–90. doi: 10.1016/j.fuel.2014.01.070. [DOI] [Google Scholar]
  • 47.Zhang, L. et al. End-to-end symmetry preserving inter-atomic potential energy model for finite and extended systems. In: Bengio, S. et al. (eds) Advances in Neural Information Processing Systems 31, 4436–4446 (Curran Associates Inc, 2018).
  • 48.Smithy, G. P. et al. GRI_Mech 30. http://combustion.berkeley.edu/gri-mech/ (1999).
  • 49.Reid IAB, Robinson C, Smith DB. Spontaneous ignition of methane: Measurement and chemical model. Symp. Int. Combust. Proc. 1985;20:1833–1843. doi: 10.1016/S0082-0784(85)80681-0. [DOI] [Google Scholar]
  • 50.Wu YZ, Sun H, Wu L, Deetz JD. Extracting the mechanisms and kinetic models of complex reactions from atomistic simulation data. J. Comput. Chem. 2019;40:1586–1592. doi: 10.1002/jcc.25809. [DOI] [PubMed] [Google Scholar]
  • 51.Dontgen M, et al. Automated discovery of reaction pathways, rate constants, and transition states using reactive molecular dynamics simulations. J. Chem. Theory Comput. 2015;11:2517–2524. doi: 10.1021/acs.jctc.5b00201. [DOI] [PubMed] [Google Scholar]
  • 52.Zhang Y, et al. DP-GEN: a concurrent learning platform for the generation of reliable deep learning based potential energy models. Comput. Phys. Commun. 2020;253:107206. doi: 10.1016/j.cpc.2020.107206. [DOI] [Google Scholar]
  • 53.Ju Y, Sun W. Plasma assisted combustion: dynamics and chemistry. Prog. Energy Combust. Sci. 2015;48:21–83. doi: 10.1016/j.pecs.2014.12.002. [DOI] [Google Scholar]
  • 54.Chen W-K, Liu X-Y, Fang W-H, Dral PO, Cui G. Deep learning for nonadiabatic excited-state dynamics. J. Phys. Chem. Lett. 2018;9:6702–6708. doi: 10.1021/acs.jpclett.8b03026. [DOI] [PubMed] [Google Scholar]
  • 55.Hu D, Xie Y, Li X, Li L, Lan Z. Inclusion of machine learning kernel ridge regression potential energy surfaces in on-the-fly nonadiabatic molecular dynamics simulation. J. Phys. Chem. Lett. 2018;9:2725–2732. doi: 10.1021/acs.jpclett.8b00684. [DOI] [PubMed] [Google Scholar]
  • 56.Westermayr J, et al. Machine learning enables long time scale molecular photodynamics simulations. Chem. Sci. 2019;10:8100–8107. doi: 10.1039/C9SC01742A. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Westermayr J, Faber FA, Christensen AS, von Lilienfeld OA, Marquetand P. Neural networks and kernel ridge regression for excited states dynamics of CH2NH2+: from single-state to multi-state representations and multi-property machine learning models. Mach. Learn.: Sci. Technol. 2020;1:025009. doi: 10.1088/2632-2153/ab88d0. [DOI] [Google Scholar]
  • 58.Borges YG, Galvão BRL, Mota VC, Varandas AJC. A trajectory surface hopping study of N2A3Σu+ quenching by H atoms. Chem. Phys. Lett. 2019;729:61–64. doi: 10.1016/j.cplett.2019.05.016. [DOI] [Google Scholar]
  • 59.Schinke R, Grebenshchikov SY, Ivanov MV, Fleurat-Lessard P. Dynamical studies of the ozone isotope effect: a status report. Annu. Rev. Phys. Chem. 2006;57:625–661. doi: 10.1146/annurev.physchem.57.032905.104542. [DOI] [PubMed] [Google Scholar]
  • 60.Pezzella M, Koner D, Meuwly M. Formation and stabilization of ground and excited-state singlet O2 upon recombination of (3)P oxygen on amorphous solid water. J. Phys. Chem. Lett. 2020;11:2171–2176. doi: 10.1021/acs.jpclett.0c00130. [DOI] [PubMed] [Google Scholar]
  • 61.Koner D, Bemish RJ, Meuwly M. The C((3)P) + NO(X(2)Pi)–> O((3)P) + CN(X(2)Sigma(+)), N((2)D)/N((4)S) + CO(X(1)Sigma(+)) reaction: rates, branching ratios, and final states from 15 K to 20 000 K. J. Chem. Phys. 2018;149:094305. doi: 10.1063/1.5046906. [DOI] [PubMed] [Google Scholar]
  • 62.Koner D, Unke OT, Boe K, Bemish RJ, Meuwly M. Exhaustive state-to-state cross sections for reactive molecular collisions from importance sampling simulation and a neural network representation. J. Chem. Phys. 2019;150:211101. doi: 10.1063/1.5097385. [DOI] [PubMed] [Google Scholar]
  • 63.BOVIA, Materials Studio 2017 https://www.3ds.com/products-services/biovia/resource-center/citations-and-references/ (Dassault Systèmes, San Diego, 2017).
  • 64.Aktulga HM, Fogarty JC, Pandit SA, Grama AY. Parallel reactive molecular dynamics: numerical methods and algorithmic techniques. Parallel Comput. 2012;38:245–259. doi: 10.1016/j.parco.2011.08.005. [DOI] [Google Scholar]
  • 65.Chenoweth K, Van Duin AC, Goddard WA. ReaxFF reactive force field for molecular dynamics simulations of hydrocarbon oxidation. J. Phys. Chem. A. 2008;112:1040–1053. doi: 10.1021/jp709896w. [DOI] [PubMed] [Google Scholar]
  • 66.O’Boyle NM, et al. Open Babel: an open chemical toolbox. J. Cheminformatics. 2011;3:33. doi: 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Tarjan R. Depth-first search and linear graph algorithms. SIAM J. Comput. 1972;1:146–160. doi: 10.1137/0201010. [DOI] [Google Scholar]
  • 68.Rupp M, Tkatchenko A, Müller K-R, Von Lilienfeld OA. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 2012;108:058301. doi: 10.1103/PhysRevLett.108.058301. [DOI] [PubMed] [Google Scholar]
  • 69.Hloucha M, Deiters U. Fast coding of the minimum image convention. MoSim. 1998;20:239–244. [Google Scholar]
  • 70.Sculley, D. Web-scale k-means clustering. In: Rappa, M. et al. (eds) Proc. 19th International Conference on World Wide Web (ACM, 2010).
  • 71.Zhang L, Lin D-Y, Wang H, Car R, Weinan E. Active learning of uniformly accurate interatomic potentials for materials simulation. Phys. Rev. Mat. 2019;3:023804. [Google Scholar]
  • 72.Frisch, M. et al. Gaussian 16, revision A. 03 (Gaussian Inc, Wallingford CT, 2016).
  • 73.Haoyu SY, He X, Li SL, Truhlar DG. MN15: A Kohn–Sham global-hybrid exchange–correlation density functional with broad accuracy for multi-reference and single-reference systems and noncovalent interactions. Chem. Sci. 2016;7:5032–5051. doi: 10.1039/C6SC00705H. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Wang H, Zhang L, Han J, Weinan E. DeePMD-kit: a deep learning package for many-body potential energy representation and molecular dynamics. Comput. Phys. Commun. 2018;228:178–184. doi: 10.1016/j.cpc.2018.03.016. [DOI] [Google Scholar]
  • 75.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In: Tuytelaars, T. et al. (eds) Proc. IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2016).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (597.4KB, pdf)
Peer Review File (478.5KB, pdf)

Data Availability Statement

The datasets (structures, potential energies and atomic forces of molecular species) generated during the current study are available at https://github.com/tongzhugroup/NNREAX, 10.6084/m9.figshare.12973055. Source data are provided with this paper.

The codes used to generate the datasets in the current study are available at https://github.com/tongzhugroup/mddatasetbuilder, 10.5281/zenodo.4035925.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES