Does Hessian Data Improve the Performance of Machine Learning Potentials?

Austin Rodriguez; Justin S Smith; Jose L Mendoza-Cortes

doi:10.1021/acs.jctc.5c00402

. 2025 Jul 2;21(14):6698–6710. doi: 10.1021/acs.jctc.5c00402

Does Hessian Data Improve the Performance of Machine Learning Potentials?

Austin Rodriguez ^†, Justin S Smith ^‡, Jose L Mendoza-Cortes ^†,^§,^*

PMCID: PMC12288065 PMID: 40601303

Abstract

The integration of machine learning into reactive chemistry, materials discovery, and drug design is transforming the development of novel molecules and materials. Machine Learning Interatomic Potentials (MLIPs) predict potential energies and forces with quantum chemistry accuracy, surpassing traditional approaches. Incorporating force fitting in MLIP training enhances potential-energy surface predictions and improves model transferability and reliability. This paper introduces and evaluates the integration of Hessian matrix training in MLIPs, which encodes second-order information about the PES curvature. Our evaluation focuses on models trained only to equilibrium geometries and first-order saddle points (i.e., critical points on the potential surface), demonstrating their ability to extrapolate to nonequilibrium geometries. This integration improves extrapolation capabilities, allowing MLIPs to accurately predict energies, forces, and Hessian predictions for nonequilibrium geometries. Hessian-trained MLIPs enhance reaction pathway modeling, transition state identification, and vibrational spectra predictions, benefiting molecular dynamics (MD) simulations and Nudged Elastic Band (NEB) calculations. By analyzing models trained with varying combinations of energy, force, and Hessian data on a small molecule reactive data set, we demonstrate that models including Hessian information not only extrapolate more accurately to unseen molecular systems, improving accuracy in reaction modeling and vibrational analysis, but also reduce the total amount of data required for effective training. However, the primary trade-off is increased computational expense, as Hessian training requires more resources than conventional energy-force training. Our findings provide comprehensive insights into the advantages and limitations of Hessian integration in MLIP training, allowing practitioners in computational chemistry to make informed decisions about employing this method in accordance with their specific research objectives and computational constraints.

graphic file with name ct5c00402_0009.jpg

graphic file with name ct5c00402_0007.jpg

1. Introduction

The application of machine learning to drug and material design promises to revolutionize the development of new molecules and materials for real-world applications. − Machine learning interatomic potentials (MLIPs), which predict accurate potential energies and forces with quantum chemistry accuracy, are enabling new science by removing computational barriers to many applications, including reactive chemistry, − materials discovery and characterization, − and drug design. − MLIPs are typically trained to density functional theory (DFT) or post Hartree–Fock-calculated potential energies and forces.

The potential energy and atomic forces of a system of atoms are mathematically related; the forces are the negative gradient of the potential energy surface, i.e., the forces are the slope of the potential energy surface at a given point in atom position space. Since most MLIPs are constructed to meet the requirements of a mathematical potential, , force fitting through a modified loss function acts as a natural regularization for learning the potential energy surface of a system of atoms. − The shift from energy-only training to integrating force training in MLIP development marked a significant advancement around 2018, enhancing the fidelity of atomistic simulations with MLIPs. This transition, highlighted by efforts to simulate infrared spectra using machine learning molecular dynamics, underscores a wider trend toward more accurate and transferable MLIPs. Similarly, incorporating stress in addition to force and energy in the loss function has been shown to be crucial for accurately reproducing specific phenomena such as phase transitions, underscoring the nuanced methodological enhancements in MLIP development. Furthermore, MLIP packages, such as TorchANI, make force training capabilities widely accessible, encouraging broader application and experimentation within the scientific community.

The Hessian matrix, representing second-order partial derivatives, offers deeper insights into the curvature of the underlying surface for a given geometry. When applied to a potential energy function, such as one describing the potential energy of 3D molecules and materials, the Hessian matrix provides critical information about how a geometry maps to the local surface of the potential: specifically, it can delineate areas of concavity and convexity, identify local minima, maxima, and saddle points. These characteristics are pivotal in the prediction of molecular stability, reaction pathways, and understanding the intrinsic properties of materials at the atomic and molecular levels. Incorporating the Hessian matrix into MLIP training, along with forces, represents a significant step toward refining the accuracy and transferability of these models. The Hessian matrix’s ability to pinpoint local minima and saddle points directly enhances MLIPs’ capability to accurately model complex energy landscapes. This approach not only captures the forces acting on the atoms (first-order derivatives) but also provides a detailed view of the shape of the energy landscape (through second-order derivatives). Such an enriched representation allows for a more nuanced understanding of chemical reactions, including the identification of transition states and mechanisms of phase transitions.

In this paper, we implement the Hessian loss term for fitting to a reactive chemistry data set including Hessian-labeled data. Our goal is to evaluate the advantages and disadvantages of including Hessian fittings for developing MLIPs within the context of reactive chemistry. The data set we deploy in this evaluation includes 35,087 data points from 11,961 reactions. We test models trained with different combinations of Hessian, force, and energy loss terms and different subsets of our reactive data set to analyze the impact of using Hessian data in training. As shown in Figure , we demonstrate that, while training with Hessian loss is more expensive, models trained with energy, force, and Hessian data extrapolate better than models trained only with energy or force data. The result of better extrapolation is that fewer overall data are needed in the training process, partially offsetting the higher computational cost. We further show that adding Hessian information allows a model trained only to the critical points (reactant, transition state, and product) of an intrinsic reaction coordinate (IRC) pathway to perform reasonably well on the entire IRC, on perturbed structures along the IRC, and on perturbed structures involved in molecular dynamics (MD) simulations. The most significant downsides to adding second derivative information to MLIP training are the increased training time and the high computational cost of generating these data from quantum chemistry, making the method less ideal for active learning data set generation techniques. In presenting these findings, we aim to provide computational chemistry practitioners with a thorough understanding of the benefits and limitations of using Hessian data in the training of the MLIP model. This knowledge equips them to make informed decisions about generating and incorporating Hessian data based on their specific research needs and computational resources.

Overall view of the effects of using energies, forces, and Hessian information in the MLIP’s prediction accuracy of the energetics of molecules outside of the training data set. The two-dimensional Potential Energy Surface (PES) is shown in plot (a). Plots (b–d) show the average predictions of the energies of intermediate structures in a sample chemical reaction’s intrinsic reaction coordinate (IRC) path along with their standard deviation (colored areas above and below the average). These predictions are made by ensembles of MLIP models fitted to the energies (b); energies and forces (c); and energies, forces, and Hessian data (d) of only the Transition State (TS) geometry of the same sample reaction calculated using DFT. Plot (e) depicts the energy prediction errors for nonequilibrium structures from models fitted to energies; energy and forces; and energy, forces, and Hessian data trained with different percentages of a reactants, TS, and products data set. Plot (f) shows the temperature reached until failure of Molecular Dynamics (MD) simulations for a subset of molecules using the different models as force fields. Finally, plot (g) shows the increasing training time for the different fittings with an increasing amount of training data.

2. Methodology

The primary objective of this work is to analyze the impact of incorporating Hessian matrix data from each molecular geometry into machine learning interatomic potential (MLIP) training, such as the ANI model. This incorporation aims to enhance the predictive accuracy and extrapolation capabilities of the models, enabling these models to accurately estimate the potential energies of molecular systems involved in chemical reactions compared to those of DFT calculations, while minimizing the reliance on a vast number of data points.

Specifically, each molecular configuration in the data set that is fed into the model will include the inputs and loss targets provided in Table . The symbol M delineates the aggregate count of molecules present within a given batch or data set, and N specifies the number of atoms within an individual molecule, or the maximum number of atoms in any one molecule within a batch or data set (for smaller molecules in the batch or data set, the excess elements are padded). The term E _i is used to signify the molecular energy associated with molecule i, where i ranges from 1 to M. Furthermore, the notation x _j and x _k is used to represent coordinates within a three-dimensional Cartesian plane, with j and k ranging from 1 to 3N.

1. Inputs and Loss Targets Used in Training the MLIP Model.

	property	dimensions	representation	units
inputs	species	M × N	Z _l	None
	coordinates	M × N × 3	x _j	Å
loss targets	energy	M × 1	E _i	kcal mol^–1
	atomic forces	M × N × 3	∂E _i/∂x _j	kcal Å^–1 mol^–1
	Hessian matrix	M × 3N × 3N	∂² E _i/(∂x _j∂x _k)	kcal Å^–2 mol^–1

Open in a new tab

2.1. Data Collection and Preparation

The initial step involves systematically collecting a diverse data set that encompasses a broad spectrum of molecular geometries relevant to chemical reactions. While this work was in-progress, a database of molecules derived from the QM9 data set was published containing numerical Hessian matrices. These molecular configurations were exclusively equilibrium geometries. − In this work we focus on developing a database of Hessian calculations from structures corresponding to minima points as well as saddle points (transition states) in the PES of a large set of reactions. We generated a comprehensive initial data set that contains tens of thousands of elementary chemical reactions based on the data set of Grambow et al. Each reaction in the data set comprises DFT-optimized geometries of reactants, products, and transition states, providing detailed information on the molecular structures and energetics involved. The geometry optimizations in that original data set were performed using the ωB97XD functional and the def2-TZVP basis set.

Starting with optimized structures from this initial data set, we conducted further geometry optimizations for each reactant, transition state, and product to ensure that the geometries are in the local minima/saddle point consistent with our selected software implementation and basis set. Furthermore, we performed frequency analysis calculations to derive the analytical Hessian matrix via DFT. All calculations were performed using the Gaussian16 software package at the level of theory ω B97XD/6-31g (d). The convergence criteria for molecular geometry optimizations in Gaussian16 require the maximum and RMS force components to be below 4.5 × 10^–4 hartree/Bohr (equivalent to approximately 0.53 kcal Å^–1 mol^–1) and 3.0 × 10^–4 hartree/Bohr (approximately 0.36 kcal Å^–1 mol^–1), respectively, and the maximum and RMS atomic displacements to be under 1.8 × 10^–3 Bohr (equivalent to approximately 9.5 × 10^–4 Å) and 1.2 × 10^–3 Bohr (approximately 6.4 × 10^–4 Å), respectively.

The choice of the ωB97XD functional and the 6-31g(d) basis set is motivated by several factors. First, the ωB97XD functional is a widely used hybrid density functional that incorporates the long-range correction to improve the description of noncovalent interactions, such as hydrogen bonding and dispersion forces. , This functional has shown good performance in reproducing a wide range of molecular properties, making it suitable for studying chemical reactions involving H, C, N, and O. Additionally, the 6-31g(d) basis set provides a balanced description of the electronic structure while maintaining computational efficiency. − It includes polarization functions that capture the electron density redistribution around atoms, allowing for an accurate representation of charge distributions and molecular properties. However, it is known to suffer from limitations in describing long-range interactions and noncovalent complexes, which should be considered when extending MLIPs to systems where dispersion or weak interactions play a major role. Another important consideration for using the ωB97XD functional and the 6-31g(d) basis set is its compatibility with the ANI-1x model, on which our Hessian-Trained MLIP is based. ,−

Our data set is a rich collection that includes not only 35,087 molecular geometries from 11,961 elementary chemical reactions but also key properties such as electronic energies, atomic forces, and Hessian matrices. In addition, we provide a detailed analysis of the composition of the data set, including distributions of atom types and frequencies of bond types, in Figures S1 and S2 in the Supporting Information. To evaluate the Hessian-trained MLIP ability to extrapolate, we develop a data set of 34,248 structures from 600 Intrinsic Reaction Coordinate (IRC) paths for a randomly selected subset of the reactions. These IRC geometries act as a rigorous benchmark, enabling us to test the accuracy of our MLIP in predicting reaction pathways. Finally, for testing the ability of the Hessian-trained MLIP to extrapolate its predictions to other nonequilibrium structures outside the IRC path, we generated 62,527 perturbed molecular structures from the intermediate IRC structures of 574 reactions in the randomly selected subset via normal mode sampling (NMS). This comprehensive overview ensures that the diversity and coverage of the data set is suitable to evaluate the machine learning model and to apply the Hessian-trained MLIP.

2.2. Hessian Matrix Incorporation

To achieve a more detailed understanding of the Potential Energy Surface (PES) of molecular systems, extending our data set to include information beyond merely potential energy and atomic forces is crucial. A pivotal advancement in this work is the integration of the Hessian matrix in the loss function during MLIP model training. The Hessian matrix, detailed in Equation S1, contains the second derivatives of the total molecular energy with respect to the atomic positions for each molecular geometry within our data set. The Hessian matrix contains information about the dynamic properties of chemical systems such as the vibrational characteristics of molecules and their stability.

2.3. Machine Learning Model Adaptation

The architecture of the ANI model makes use of modified Behler and Parrinello symmetry functions G _k to capture the chemical environments that surround individual atoms. , These environments are encoded within atomic environment vectors (AEVs), which serve as a probe into the radial and angular domains surrounding an atom. Upon translating the atomic coordinates of a chemical system into AEVs, these vectors become the inputs for a specialized form of high dimensional neural network potentials (HD-NNP). Different NNPs are deployed for each element, each equipped with its own set of weights and biases tailored to the element’s specific characteristics through training. These neural networks undergo optimization (or training) processes, fine-tuning their parameters to align with the high-dimensional details captured in the data set. Architecturally, these HD-NNPs are structured as feedforward neural networks, featuring multiple hidden layers and a variety of neurons. The outputs of each of these NNPs correspond to a partition per atom of the molecular potential energy. These values are summed up to obtain the potential energy. The general flow of information is visualized in Figure S3, where the energy of a single formaldehyde molecule is calculated using the MLIP model.

2.3.1. Force and Hessian Calculation Using Automatic Differentiation

Calculating atomic forces and Hessian matrices is possible with the use of automatic differentiation, a powerful computational technique that offers a robust and efficient means of determining the gradients of complex functions. This approach leverages the inherent architecture of the neural network, which, by design, facilitates the direct differentiation of the energy output with respect to its inputs. This capability is crucial for accurately modeling the dynamics of molecular systems, as it allows for the prediction of first- and second-order derivatives under a wide range of conditions with minimal computational overhead.

In practice, we have to feed the MLIP with batches of hundreds of molecules at a time, and the dimensionality of the PyTorch tensors is critical when training these NNP models. As mentioned in Table , the shape of the coordinates, as well as the force tensor, is [batch_size, N, 3], where batch_size is the number of molecules in a batch and N is the number of atoms of the molecule with the highest number of atoms in the data set. If the number of atoms in any particular molecule is less than N, the rest of the tensor is padded with zeros. In this manner, the coordinates and force tensors grow along one dimension until they reach the appropriate shape. The atomic forces of a batch are calculated simultaneously for all molecules in a batch via the autograd.grad function from PyTorch, resulting in a matrix containing the first derivatives of the energy with respect to the coordinates for each molecule, which we multiply by −1 to obtain the forces. Additionally, the Hessian matrix can be calculated from our model in a similar manner. By entering the negative of the forces into the autograd.grad function, we can obtain the second derivatives of the energy with respect to each pair of Cartesian coordinates. However, there are some difficulties associated with the Hessian matrix format. For example, the Hessian tensor grows quadratically with the number of atoms (having 3N × 3N elements), where molecules with fewer than N atoms have padded Hessian matrices. This scaling presents a computational challenge for both memory and data handling in machine learning frameworks. The unpadded Hessian tensor grows in two dimensions instead of one, since the Hessian tensor has a shape of [batch_size, 3N, 3N]. This requires more memory than batching energies or forces alone and complicates training due to the larger tensor dimensionality and data sparsity. eqs and are derived using the chain rule on a simplified model and are a simple representation of the calculations made by automatic differentiation.

F_{x_{m}} = - \frac{\partial E_{T}}{\partial x_{m}} = - \sum_{k = 1}^{3} (\frac{\partial E_{T}}{\partial G_{k}} \times \frac{\partial G_{k}}{\partial x_{m}})

\begin{array}{l} H_{x_{m} x_{n}} & = \frac{\partial (\partial E_{T} / \partial x_{m})}{\partial x_{n}} \\ = \frac{\partial}{\partial x_{n}} [\sum_{k = 1}^{3} (\frac{\partial E_{T}}{\partial G_{k}} \times \frac{\partial G_{k}}{\partial x_{m}})] \\ = \sum_{k = 1}^{3} (\frac{\partial E_{T}}{\partial G_{k}} \times \frac{\partial^{2} G_{k}}{\partial x_{n} \partial x_{m}}) \end{array}

2.3.2. Loss Function

Training to Hessian data requires incorporating an error metric, such as the root mean square error (RMSE), of the Hessian matrix into the loss function of a MLIP as well as error metrics for the energies and forces. The RMSE for energies directly evaluates the model’s ability to predict the potential energy of a given geometry, while for forceswhich are derived from the gradient of energy with respect to atomic positionsit measures the model’s accuracy in predicting the high-dimensional slope of the potential energy. The inclusion of the Hessian RMSE extends this further by assessing the model’s accuracy in predicting the high-dimensional curvature of the energy landscape, which is crucial for identifying maxima, minima, and saddle points. These are crucial for modeling important properties such as transition states, and vibrational frequencies. eqs , , and are used in the calculation of the energy loss, force loss, and Hessian loss terms, respectively. In this set of equations, P ^ref represents the reference property used in the training as ground truth (in our case, they are the properties calculated by DFT), while P ^pred represents the predicted property obtained from our model. The properties can be the molecular potential energy E _T,i, a force component F _i, or a Hessian element H _i. Furthermore, M represents the total number of molecules in the training set, n _F represents the total number of force elements in the training set, and n _H represents the total number of Hessian elements in the training set.

ε_{E} = \sqrt{\frac{\sum_{i = 1}^{M} {(E_{T, i}^{r e f} - E_{T, i}^{p r e d})}^{2}}{M}}

ε_{F} = \sqrt{\frac{\sum_{i = 1}^{n_{F}} {(F_{i}^{r e f} - F_{i}^{p r e d})}^{2}}{n_{F}}}

ε_{H} = \sqrt{\frac{\sum_{i = 1}^{n_{H}} {(H_{i}^{r e f} - H_{i}^{p r e d})}^{2}}{n_{H}}}

However, the magnitudes of energies, forces, and elements of the Hessian matrix can vary significantly, both in terms of their physical units and their scales within a given problem. To ensure that each component contributes appropriately to the loss function, normalization factors are essential. These factors adjust the scale of the RMSE values for each term, enabling a balanced optimization that does not disproportionately favor the accuracy of one property over another. By applying normalization factors, we ensure that the model is optimized for an equitable accuracy across these properties, facilitating the development of a more reliable and versatile predictive tool. The final loss function used in our MLIP model is represented in eq , where η_F is the normalization factor for the force loss and η_H is the normalization factor for the Hessian loss. The values of η_F = 0.08 and η_H = 0.02 were used in our trainings. These values were determined empirically to balance the magnitudes of the energy, force, and Hessian RMSE contributions in the total loss across mini-batches during training. Specifically, we trained an initial E–F-H model with η_F = η_H = 1 to a training data set; obtained the final energy RMSE, force RMSE, and Hessian RMSE; and calculated the factors that would make the final force RMSE and the final Hessian RMSE equal to the final energy RMSE.

L (E^{p r e d}, F^{p r e d}, H^{p r e d}) = ε_{E} + η_{F} ε_{F} + η_{H} ε_{H}

2.4. Molecular Dynamics Simulations

To evaluate the dynamical stability of the Hessian-trained MLIP versus a non-Hessian trained MLIP, MD simulations were conducted. In the following subsections we present the methods used in this evaluation.

2.4.1. Simulation Setup

MD simulations were performed using the Hessian- and non-Hessian trained MLIPs as the force field, implemented within the Atomic Simulation Environment (ASE) framework. A Langevin thermostat was used to control the temperature during the simulations, using a friction factor of 0.01 fs^–1.

The initial atomic configurations for the simulations were derived from optimized molecular structures. Each system was initialized at a starting temperature of 5 K. A time step of 0.5 fs was employed. A simulation is then conducted for 5 ps or 10,000 time steps. After the initial 5 ps interval is complete, the temperature is increased by 5 K. An iterative heating protocol (5 ps simulation followed by 5 K temperature increase) was continued until a predefined failure criterion was met.

2.4.2. Stability Failure Criteria

In this work, we define the failure of stability as shown below. Failure occurs if

\begin{array}{l} \exists (i, j) & s . t . \frac{1}{50} \sum_{t = t - 49}^{t} d_{i j} (t) > 1.5 d_{i j}^{e q} \\ or \\ \exists (i, j) & s . t . \frac{1}{50} \sum_{t = t - 49}^{t} d_{i j} (t) < 0.75 d_{i j}^{e q} \end{array}

Where d _ij(t) is the distance between the atoms i and j at time step t and d _ij is the geometry-optimized equilibrium bond distance between the atoms i and j. For the molecules evaluated in this work, which are from the MD17 benchmark, previous literature conducted ab initio MD simulations at 500 K for between 50 and 497 ps where such distortions were not observed. , Hence, we do not expect physically accurate reactions to occur up to a temperature of 500 K during our simulations.

2.5. Computation of Reaction Pathways

To evaluate the ability of Hessian-trained MLIPs to accurately describe reaction pathways and transition states, Nudged Elastic Band (NEB) calculations were performed. − The NEB method determines a minimum energy pathway (MEP) by optimizing a set of interpolated molecular geometries, referred to as images, between the reactant and product states. These images are connected by virtual spring forces, which maintain an even distribution along the reaction coordinate and prevent artificial clustering in low-energy regions.

2.5.1. Computational Setup

NEB calculations were carried out using the ASE framework. The reactant and product geometries were first optimized both at the DFT level and at the model level before being used as end points for the NEB calculations. The key computational parameters were as follows:

58 images were used to achieve a smooth representation of the reaction path.
A force constant of 50 eV/Å² was applied.
The LBFGS optimizer was used for relaxation.
Climbing Image NEB (CI-NEB) was applied to the highest-energy image to accurately locate the transition state (TS).

2.6. Calculation of Reaction Vibrational Spectra

To calculate the vibrational spectra of a reaction, the Hessian matrices of each step in the MEP of a chemical reaction is calculated by either a frequency calculation job from Gaussian16 to obtain the DFT-calculated Hessian or by automatic differentiation of the energy predictions of the Hessian-trained MLIP to obtain the model’s Hessian prediction. Vibrational frequencies were obtained from the diagonalization of both the DFT-generated and the model-generated Hessian matrices at each point along the reaction coordinate. The computed normal modes describe molecular distortions, while their corresponding frequencies provide a spectroscopic fingerprint of the system at each step.

To ensure a direct comparison between MLIP and DFT vibrational frequencies, we extracted the raw Cartesian Hessians from Gaussian16 and computed vibrational modes by diagonalizing the mass-weighted Hessian matrices ourselves, without applying translational or rotational mode projections. The same procedure was used for MLIP-predicted Hessians, ensuring a consistent and unbiased frequency comparison.

3. Results and Discussion

This section evaluates how the ANI machine learning interatomic potential (MLIP) model performs when trained with different complexities of informationsingle-point energies (E), atomic force vectors (E–F), and the Hessian matrix (E–F-H)through increasingly stringent validation scenarios. The analysis begins with training an ensemble of 100 ANI models to a single reaction transition state (one data point), with the goal of understanding how higher-order data impact the models’ ability to extrapolate to the complete IRC pathway. Evaluation expands by training a set of models to a diverse data set of stationary points, that is, reactants, transition states, and products (hitherto referred to as the RTP training set and models), then assessing the extrapolation capabilities of the models across various molecular configurations and reaction pathways, including critical points, IRC, molecular dynamics trajectories, vibrational frequencies, and perturbed structures along the IRC. We then explore the data efficiency and the computational demands of each approach.

3.1. Impact on Extrapolation

In this section, we test the hypothesis (see Figure ) that Hessian matrix data improve extrapolation performance of the neural network-based ANI MLIP through experimentation. In this experiment, we train three types of models with different loss functions: energy (E), energy–force (E–F), and energy-force-Hessian (E–F-H). Based on eq , the E models have η_F = η_H = 0, the E–F models have a nonzero η_F and η_H = 0, while the E–F-H models have nonzero η_F and η_H. An ensemble of 100 ANI models is trained for each loss to a single data point, the transition state of the reaction H₂O + C₃H₆O → CH₃OCH₂CH₂OH. We then compare the three ensembles of modelsE, E–F and E–F-Hto predict discrete points along the entire IRC from reactant to product.

In Figure a, the models predict the energy of the TS point well. However, as expected, the farther the test structure is from the TS, the worse the models perform. Figure b shows that all models perform poorly for predictions far from the TS. Both the E and E–F models show increasing uncertainty and error at ± 0.25 IRC units (u ^1/2 Å) from the TS.

Energy predictions for the 30 molecular structures in the IRC closest to the TS structure (a) with a line plot of the mean errors of the models (b). The energy fitting, energy-force fitting, and energy-force-Hessian fitting models are represented with the colored markers and colored error areas. DFT-calculated energies are shown as black circles in (a).

The E–F-H models follow the reference potential energies better, especially on the right side of the TS, with lower uncertainties and errors until +0.6 IRC units. However, they infer an erroneous energy increase below −1.0 IRC units, as seen in Figure S4. Since the models were trained on a single data point in the IRC, we do not expect reasonable predictions far from the training point.

Figure provides a detailed description of energy predictions for the 30 points closest to the TS, including a stacked mean error plot. The E models diverge from the TS energy sooner than the E–F models, indicating that models trained with a force term learn the zero gradient of the PES at the TS and maintain a constant energy around it.

The E–F-H models are the only ensemble that learned the shape of the PES around the TS, with most predictions lower than the TS energy, indicating they learned the TS represents a local maximum. Despite some random behavior in energy predictions due to weight initialization, as shown in Figure S5, the majority of E–F-H models retain information about the PES shape near the TS. The evidence provided by this test case shows a strong tendency for Hessian training to enable local extrapolation on the potential energy surface compared to training without it.

However, while the E–F-H model closely follows the expected behavior near the TS, it tends to overflatten the PES along one direction of the reaction coordinate. This behavior may stem from the limited spatial information provided during training, as the model is exposed only to local curvature without global PES constraints. Interestingly, this overflattening is qualitatively similar to the behavior of ANI ensemble models (see Figure S5), which also tend to plateau when extrapolating far from training geometries. Including a quadratic baseline in future work may help clarify how much of the extrapolation behavior is driven by the local Hessian.

3.2. Testing the Models on Held-Out Reactants, Transition States, and Products

In this section, the MLIP models are trained with the RTP data set, consisting of 35,087 molecular geometries from 11,961 elementary chemical reactions, and tested using a reserved set of reactants, transition states, and products from the same data set. This phase assesses the models’ capacity to generalize to unseen chemical systems by predicting the potential energies of stationary points (reactants, transition states, and products). Our train/test split was performed at the geometry level rather than at the reaction or compound level. Although specific stationary point geometries in the test set do not appear in the training set, other geometries from the same reactions might, reducing the challenge of predicting unseen stationary points. However, we do not expect this to affect the significance of our model comparison.

Table presents the RMSE for each loss function used in our models. The RMSE for the E–F model’s energy predictions is 4.29 kcal mol^–1, higher than the E model’s 3.83 kcal mol^–1. This increase is due to the added complexity of training the model to attain the correct slope of the potential surface, minimizing random behavior. However, the E–F-H model shows a lower RMSE of 3.67 kcal mol^–1 for energy predictions.

2. Energy, Force, and Hessian RMSEs of Each Model Fitting Tested on 35,087 Reactants, Transition States, and Products in the Database.

training	energy RMSE (kcal mol^–1)	force RMSE (kcal Å^–1 mol^–1)	Hessian RMSE (kcal Å^–2 mol^–1)
E fitting	3.83 ± 0.23	53.14 ± 2.90	208.58 ± 15.17
E–F fitting	4.29 ± 0.18	4.87 ± 0.10	146.32 ± 1.90
E–F-H fitting	3.67 ± 0.11	5.61 ± 0.16	12.76 ± 0.24

Open in a new tab

Interestingly, the E–F model outperforms the E–F-H model in force prediction (4.87 kcal mol^–1 vs 5.61 kcal mol^–1) on this set of stationary test points. This unexpected result could be due to overfitting, where the model predicts a flatter potential across phase space. This hypothesis will be tested as benchmark cases move further from equilibrium structures with nonzero reference forces. The E model has significantly higher errors for forces, indicating substantial overfitting, with the slope of the potential surface at stationary points approaching random.

For the Hessian prediction task, both the E and E–F models show an order of magnitude or worse RMSE increase compared to the E–F-H model, providing initial evidence of the E and E–F models high degree of overfitting.

3.3. Testing the Model on IRC Structures

In this benchmark, the E, E–F, and E–F-H models trained on the RTP data set are evaluated on structures derived from IRC paths, representing the minimum energy path from reactants, through transition states, to products.

Table quantifies the prediction accuracy for IRC structures using the RMSE metric for energy, force, and Hessian predictions compared to density functional theory reference calculations. The results show that adding higher-order information (forces and Hessian) to the model increases accuracy for energy predictions along the minimum energy pathway. Adding a force term to the loss function reduces the energy prediction error by 31.3% to 7.30 kcal mol^–1, while adding both forces and Hessian terms reduces it by 38.3% to 6.55 kcal mol^–1 compared to the E model (10.62 kcal mol^–1).

3. Energy, Force, and Hessian RMSEs of Each Model Fitting Tested on 34,248 Structures in the IRCs of Around 574 Reactions (see Supporting Information, Figure S17).

training	energy RMSE (kcal mol^–1)	force RMSE (kcal Å^–1 mol^–1)	Hessian RMSE (kcal Å^–2 mol^–1)
E fitting	10.62 ± 0.37	58.06 ± 2.79	218.79 ± 18.09
E–F fitting	7.30 ± 0.15	8.48 ± 0.11	143.59 ± 1.77
E–F-H fitting	6.55 ± 0.14	7.30 ± 0.10	14.66 ± 0.12

Open in a new tab

Although the training molecules consist only of stationary points (reactants, transition states, and products), predicting intermediate structures in a reaction is impressive but still an interpolation task. As expected, the E models perform poorly in force prediction with an RMSE of 58.06 kcal mol^–1. However, the E–F-H model (RMSE 7.30 kcal mol^–1) outperforms the E–F model (8.48 kcal mol^–1) by 13.9% in force prediction, indicating that the E–F model’s better performance in the stationary point benchmark was due to overfitting. This trend continues in other benchmark cases.

For Hessian prediction, both the E and E–F models perform poorly, with more than an order of magnitude increase in RMSE compared to the E–F-H model, further indicating overfitting in the E-F model.

3.4. Testing the Model on NMS Structures outside of the IRC Path

This section aims to evaluate the extrapolation capabilities of the MLIP models trained on the RTP data set. The models are tested on perturbed structures derived from IRC pathways and normal mode sampling (NMS), which introduces thermodynamic variability in structural phase space. NMS generates structures which are randomly displaced along every normal mode of a given structure along the IRC, except the mode corresponding to the IRC path direction, providing a range of configurations from subtle to significant perturbations relative to the original IRC structures.

Table provides the RMSEs for energy, force, and Hessian predictions for all models. The model incorporating energies, forces, and Hessian terms in the loss function outperforms the others, reducing the energy RMSE to 13.52 kcal mol^–1, a 37.6% reduction compared to the model trained with energy and forces (21.67 kcal mol^–1) and a 70.8% reduction compared to the model trained only with energies (46.38 kcal mol^–1). This trend holds for both force and Hessian prediction tasks, where the Hessian-trained model, despite higher RMSEs than in previous benchmarks, greatly outperforms models without Hessian fitting.

4. Energy, Force, and Hessian RMSEs of Ensembles of Each Model Fitting Tested on 62,527 Perturbed Structures Along the IRCs of Randomly Selected Reactions via NMS.

training	energy RMSE (kcal mol^–1)	force RMSE (kcal Å^–1 mol^–1)	Hessian RMSE (kcal Å^–2 mol^–1)
E fitting	46.38 ± 2.35	81.97 ± 3.64	231.07 ± 23.91
E–F fitting	21.67 ± 0.75	26.53 ± 0.35	128.09 ± 1.27
E–F-H fitting	13.52 ± 0.21	13.47 ± 0.29	37.82 ± 1.77

Open in a new tab

The nearly 2-fold improvement in force RMSE for the E–F-H model compared to the E–F model provides further evidence that the E–F model was overfit when it outperformed the E–F-H model in the held-out reactants, transition states, and products benchmark. This superior performance underscores the value of the Hessian matrix in providing critical information on the energy landscape’s curvature, improving extrapolation capability away from the minimum-energy pathway of the IRC. Finally, the E–F-H model greatly outperforms the others on the Hessian RMSE as is expected.

3.5. An Evaluation of Extrapolation via Stability in Molecular Dynamics Simulations

In this section, we compare the RTP-trained MLIP models in molecular dynamics (MD) simulations to evaluate the dynamical stability of E, E–F, and E–F-H model ensembles on 10 molecules from the MD17 data set. This approach is inspired by Fu et al. The MD17 data set, consisting of small organic molecules with DFT-optimized geometries and properties simulated at 500 K, is commonly used for MLIP performance benchmarks. We selected an optimized configuration of each molecule as the starting point for stability evaluation simulations, with setup and failure criteria described in the Methodology section.

Reaching higher simulation temperatures without observing nonphysical behaviors measures a model’s stability and extrapolation capability. In this NVT dynamics test, starting from the optimized initial geometry, the temperature is initialized at 5 K then slowly ramped by 5 K every 5 ps. At higher temperatures, molecular systems sample a wider range of microstates, including those far from equilibrium. Here we propose that a more stable MLIP model is one that sustains simulations at higher temperatures without nonphysically breaking bonds or exhibiting close contacts between atoms (i.e., “failing”). This stability measure also reflects the model’s ability to generalize beyond its training data, capturing the nuanced dynamics of molecular systems. This is especially true since all models tested are only RTP-trained MLIPs, which are trained only to critical points on the potential energy surface (minima and transition states with atomic forces close to zero).

Figure shows the average simulation temperatures reached before failure for each model and molecule in the MD17 data set. The average simulation times before failure are also shown above their corresponding bars, with data tabulated in Table S1. Models trained with energies and forces (E–F) exhibited similar stability to energy-only (E) models. However, significant stability improvements were observed in energy–force-Hessian (E–F-H) models. These models maintained MD simulations of most MD17 molecules at temperatures above 500 K (dashed orange line in Figure ) and for significantly longer times than other models. The only molecule that failed below 500 K, the temperature at which the MD17 trajectories were generated, was azobenzene, probably due to the lack of examples of hydrogen-benzene interactions in our data set.

Temperatures reached before failure for each molecule in the MD17 data set using ensembles of MLIP models trained with energy-only (E), energy-force (E–F), and energy–force-Hessian (E–F-H) loss functions. The height of each bar represents the temperature reached before failure for a particular molecule. Simulation times reached are shown above each bar. Simulations that failed on optimization are shown as having a time of 0.0 ps in a red color. The dashed orange horizontal line at 500 K represents the temperature at which the MD simulations of the MD17 data set were run at.

While our Hessian-trained models show significant improvements in MD stability, sustaining trajectories up to 2100 K in some cases, it is important to note that MD-based stability benchmarks are sensitive to simulation conditions, random initialization, and small numerical differences. It is also important to clarify that the simulations were run only once per molecule per model. We therefore treat these results as comparative indicators rather than absolute metrics.

The stability improvements highlight the value of Hessian data in enhancing the MLIP’s representation of the PES. By incorporating higher-order information, E–F-H models gain a deeper understanding of the PES around training geometries, enabling better extrapolation to higher-energy configurations encountered during MD simulations. These findings emphasize the critical role of Hessian matrices in developing robust and reliable MLIPs for molecular simulations, particularly under challenging conditions of elevated temperatures and extended time scales with a relatively limited training data set size.

3.6. NEB Analysis: Reaction Pathways and Barrier Predictions

To assess the accuracy and extrapolation capabilities of the Hessian-trained MLIP in modeling chemical reaction mechanisms, we performed NEB calculations of an intramolecular single hydrogen transfer reaction and compared the predicted reaction barriers to those obtained from DFT calculations. The analysis focused on evaluating the MEP, the accuracy of transition state geometries, and the smoothness of the PES.

The NEB calculations using the E-only and E–F trained models failed to converge to a stable reaction pathway, producing highly irregular PESs with energy fluctuations and multiple peaks, preventing the identification of a well-defined transition state. The E-only model often collapsed the reaction pathway, skipping intermediate geometries and converging to unphysical structures. The E–F model showed minor improvements but still failed to yield a continuous MEP, indicating that force training alone was insufficient to stabilize the transition state search.

In contrast, the E–F-H trained model, incorporating energy, forces, and Hessian matrix information, successfully converged to a well-defined reaction pathway. The single-step reaction profile and height were accurately reproduced (Figure ), with the TS structure located at the saddle point. The predicted reaction barrier of 63.63 kcal mol^–1 closely matched the DFT barrier of 63.99 kcal mol^–1. However, for reactions involving simultaneous processes (e.g., 2-proton transfers and a C–O bond breaking), the model predicted sequential multistep processes, reflected in the NEB barrier shape (Figure S16). Despite this, the model accurately reproduced the barrier height. We hypothesize that increasing the data set to include such reactions could improve the model’s accuracy for multiprocess reactions.

Energy profiles obtained from NEB calculations using models trained with an E–F-H loss functions (blue dots) alongside DFT reference values (black dots). The x axis represents the geometric distances between intermediates as Intrinsic Coordinates, where a reference value of zero was assigned to the TS geometry (dashed vertical red line). The y axis represents the energies in kcal mol^–1. The atom coloring follows the CPK convention (red for oxygen, blue for nitrogen, gray for carbon, and white for hydrogen).

The convergence failure of the E-only and E–F models indicates immense value for incorporating higher-order derivative information (i.e., the Hessian) in training MLIPs for modeling reactive pathways. In our NEB experiments, only the E–F-H model stably reproduced minimum energy paths (MEPs) and located transition states for both the single-step and multistep chemical reactions. While recent large-scale models such as MACE-MP-0 and AIMNet2 have demonstrated reasonable capabilities in transition state identification and extrapolation without Hessians, our results demonstrate that explicitly training on Hessian matrices provides a marked improvement in robustness and accuracy for small, chemically reactive systems, particularly under data-constrained conditions. , The E–F-H model’s accurate reproduction of both reaction barrier height and shape (Figure ) suggests that Hessian-informed training is a powerful approach for enhancing model reliability in reaction simulations. These findings support the integration of second-order information in MLIP development for applications involving reaction kinetics and catalysis.

3.7. Vibrational Spectra as a Test of Hessian Accuracy

Vibrational frequency analysis offers a stringent benchmark for evaluating the accuracy of second-order derivatives predicted by MLIPs. Although models trained solely on energies and forces can estimate vibrational spectra, directly training on Hessian matrices may improve accuracy and stability, particularly near chemically reactive configurations. To assess this, we compared the vibrational frequencies computed from DFT-calculated Hessians to those predicted by our E–F-H trained model across 60 geometries along a minimum energy path (MEP) obtained from our earlier NEB analysis. These comparisons are shown in Figure a,b.

Vibrational frequencies of the molecular system along the reaction steps from the NEB images. The top subplot (a) displays the frequencies of the vibrational modes computed from DFT-calculated Hessians, while the bottom subplot (b) shows vibrational modes obtained from the EFH model predictions. In both cases, one of the vibrational modes exhibits a minimum at the TS structure, reflecting the characteristic softening of the reaction coordinate mode at the saddle point.

Our results show that we can reproduce most quantum-level frequencies, which usually require significant resources. For DFT frequency analysis calculations, it took an average of 301 s of CPU time per image, totaling 18,071 s for the 60 images. However, the EFH model calculated the atomic forces and Hessian matrices for the 60 images simultaneously in 0.478 s of CPU time, representing a speedup of about 5 orders of magnitude in generating vibrational spectra. Although our model reproduces the vibrational frequencies from the structural and identity information on the molecule at a fraction of the computational cost compared to DFT calculations, there is extra noise in the frequency predictions, possibly because of uncertainty.

3.8. Improved Data Efficiency

As shown in previous benchmarks, integrating the Hessian matrix into MLIP training data enhances predictive accuracy and data efficiency. By examining learning curves for energy, force, and Hessian RMSE versus training data volume, alongside average time per epoch (Figure ), we gain insights into the practical implications of incorporating second-order derivatives.

Energy, force, and Hessian root mean squared errors and average training time per epoch versus training data volume. The models shown in these figures were trained on reactants, transition states, and product of 11,961 elementary chemical reactions and tested on perturbed structures outside of the minimum energy pathway generated through normal mode sampling (NMS) on the intermediate structures of randomly selected reactions in the data set (exactly 62,527 NMS structures).

Including the Hessian matrix increases the model’s learning complexity but provides richer training information. RMSE plots in Figure a–c show that models trained with Hessian information achieve lower errors with fewer training data compared to models without it. This suggests that the Hessian matrix offers critical insights into the energy landscape, increasing the information yield per example.

Figure a shows that including Hessian information significantly improves data efficiency. The E–F-H model, trained on energies, forces, and Hessian matrices, achieves a significantly lower RMSE in energy predictions using only 2% of the data set volume compared to models trained with energy and energy-force loss functions using 80% of the data set.

However, integrating Hessian information increases computational demands, resulting in longer training times (Figure d). Computing Hessian matrices using PyTorch’s autograd.grad function involves calculating second derivatives of the energy with respect to atomic coordinates, a complex and time-consuming process. Training times are approximately 25 times longer per epoch, and routine evaluation of Hessian RMSE further exacerbates the computational load. This trade-off, along with the increased computational cost of generating data, may not justify full Hessian inclusion for all systems. However, in scenarios where high accuracy is required on small data sets, or where accurate modeling of transition states and nonequilibrium dynamics is critical, the benefit may outweigh the cost.

Despite these challenges, future innovations in computational techniques are promising. New algorithms and optimization methods could streamline Hessian matrix calculations and their integration into MLIP training. Leveraging more efficient methodologies could mitigate current computational challenges. Additionally, evaluating hybrid strategies, such as selectively sampling Hessians only for high-curvature configurations, could optimize the balance between accuracy and computational cost. Pursuing these advancements is crucial to ensure that the benefits in predictive performance are not overshadowed by increased computational demands.

4. Conclusion

In conclusion, our investigation into integrating Hessian matrix data within ANI MLIP model training shows significantly enhanced predictive accuracy and extrapolation capabilities. Including Hessian data enables MLIP models to more accurately predict energies, forces, and second-order derivatives with fewer training examples, improving data efficiency.

MD simulations with Hessian-trained MLIPs show enhanced stability and robustness under dynamic conditions, sustaining longer simulation times and higher temperatures before failure. This makes them promising for realistic molecular dynamics studies, especially in reactive environments. NEB analysis highlights the critical role of Hessian training in accurately describing reaction pathways and transition states. While energy-only (E) and energy-force (E–F) models failed to converge to stable MEPs, the E–F-H model successfully reproduced smooth PESs and transition state structures matching DFT calculations. This confirms Hessian-trained MLIPs as efficient alternatives for reaction barrier predictions and mechanistic studies. Additionally, generating vibrational spectra along reaction coordinates underscores the advantages of incorporating Hessian information. The model reproduces vibrational frequency trends, including the softening of the reaction coordinate mode at the transition state, linking computational modeling with vibrational spectroscopy.

However, incorporating Hessian information increases computational demands, with training times per epoch up to 25 times longer. This trade-off between predictive accuracy and computational resources suggests the need for advancements in computational techniques and algorithms to estimate Hessian errors, ensuring efficient and accurate MLIP models.

However, incorporating Hessian information substantially increases computational demands, not only in training time, which can be up to 25 times longer per epoch, but more critically in the cost of generating Hessian data from quantum chemistry calculations. For commonly used hybrid functionals, Hessians can be tens of times more expensive to compute than atomic forces. This trade-off between improved extrapolation accuracy and high data-generation cost underscores the need for strategic use of Hessian data, particularly in chemically reactive regimes. It also suggests the need for advancements in computational techniques and algorithms to estimate Hessian errors, ensuring efficient and accurate MLIP models.

Overall, Hessian-trained MLIPs provide a powerful approach for molecular simulations, reaction modeling, and vibrational analysis. Their ability to accurately predict energies, forces, and PES curvatures makes them valuable for computational chemistry, catalysis, and materials science. Future work will focus on enhancing computational efficiency, scalability, and generalizability, expanding their applicability across diverse molecular systems.

Supplementary Material

ct5c00402_si_001.pdf^{(11.3MB, pdf)}

Acknowledgments

We would like to acknowledge Leon Alday-Toledo for initial discussions on this project. This work was supported in part through computational resources and services provided by the Institute for Cyber-Enabled Research at Michigan State University. A.R. would like to thank the Chemical Engineering and Material Science (ChEMS) department of Michigan State University and the College of Engineering of the same university for their partial support during this research in the form of a first-year fellowship and a summer fellowship, respectively.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.5c00402.

The Supporting Information includes detailed descriptions and visual representations of the data sets used in this study, with figures illustrating the composition and distribution of molecules in the data set containing reactants, TSs and products (Figures S1 and S2). MLIP architecture diagrams are also included to illustrate the inner functioning of the models used in this work (Figure S3). In addition, the ensemble prediction plots for the energy values along the IRC of a single reaction are shown in Figures S4 and S5, showcasing the predictions from the ensembles of 100 models for each type of fit. The Supporting Information also contains correlation plots that compare the predicted values of our models with the reference values obtained from the DFT calculations, provided for three different types of models: energy fitting, energy force fitting, and energy force–Hessian fitting across the three data sets (Figures S6–S14). The simulation times reached before failure in the MD simulations for each molecule in the MD17 data set are illustrated in Figure S15, in addition to a table of temperatures and simulation times reached before failure (Table S1) to show the stability differences between the three types of models under MD simulations. The NEB plot of the multiprocess reaction is shown in Figure S16. Finally, the data efficiency plots for the IRC data set are presented in Figure S17 as a comparison to the data efficiency plots for the NMS structures data set in Figure (PDF)

The authors declare no competing financial interest.

References

Westermayr J., Gastegger M., Schütt K. T., Maurer R. J.. Perspective on integrating machine learning into computational chemistry and materials science. J. Chem. Phys. 2021;154:154. doi: 10.1063/5.0047760. [DOI] [PubMed] [Google Scholar]
Gómez-Bombarelli R., Wei J. N., Duvenaud D., Hernández-Lobato J. M., Sánchez-Lengeling B., Sheberla D., Aguilera-Iparraguirre J., Hirzel T. D., Adams R. P., Aspuru-Guzik A.. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 2018;4:268–276. doi: 10.1021/acscentsci.7b00572. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nandy A., Duan C., Taylor M. G., Liu F., Steeves A. H., Kulik H. J.. Computational discovery of transition-metal complexes: from high-throughput screening to machine learning. Chem. Rev. 2021;121:9927–10000. doi: 10.1021/acs.chemrev.1c00347. [DOI] [PubMed] [Google Scholar]
Back S., Yoon J., Tian N., Zhong W., Tran K., Ulissi Z. W.. Convolutional neural network of atomic surface structures to predict binding energies for high-throughput screening of catalysts. J. Phys. Chem. Lett. 2019;10:4401–4408. doi: 10.1021/acs.jpclett.9b01428. [DOI] [PubMed] [Google Scholar]
Smith J. S., Nebgen B. T., Zubatyuk R., Lubbers N., Devereux C., Barros K., Tretiak S., Isayev O., Roitberg A. E.. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 2019;10:2903. doi: 10.1038/s41467-019-10827-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zuo Y., Chen C., Li X., Deng Z., Chen Y., Behler J., Csányi G., Shapeev A. V., Thompson A. P., Wood M. A.. et al. Performance and cost assessment of machine learning interatomic potentials. J. Phys. Chem. A. 2020;124:731–745. doi: 10.1021/acs.jpca.9b08723. [DOI] [PubMed] [Google Scholar]
Kulichenko M., Smith J. S., Nebgen B., Li Y. W., Fedik N., Boldyrev A. I., Lubbers N., Barros K., Tretiak S.. The rise of neural networks for materials and chemical dynamics. J. Phys. Chem. Lett. 2021;12:6227–6243. doi: 10.1021/acs.jpclett.1c01357. [DOI] [PubMed] [Google Scholar]
Batzner S., Musaelian A., Sun L., Geiger M., Mailoa J. P., Kornbluth M., Molinari N., Smidt T. E., Kozinsky B.. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 2022;13:2453. doi: 10.1038/s41467-022-29939-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Musaelian A., Batzner S., Johansson A., Sun L., Owen C. J., Kornbluth M., Kozinsky B.. Learning local equivariant representations for large-scale atomistic dynamics. Nat. Commun. 2023;14:579. doi: 10.1038/s41467-023-36329-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Behler J.. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem., Int. Ed. 2017;56:12828–12840. doi: 10.1002/anie.201703114. [DOI] [PubMed] [Google Scholar]
Segler M. H., Preuss M., Waller M. P.. Planning chemical syntheses with deep neural networks and symbolic AI. Nature. 2018;555:604–610. doi: 10.1038/nature25978. [DOI] [PubMed] [Google Scholar]
Coley C. W., Barzilay R., Jaakkola T. S., Green W. H., Jensen K. F.. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 2017;3:434–443. doi: 10.1021/acscentsci.7b00064. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kang P.-L., Liu Z.-P.. Reaction prediction via atomistic simulation: from quantum mechanics to machine learning. Iscience. 2021;24:102013. doi: 10.1016/j.isci.2020.102013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schwaller P., Vaucher A. C., Laino T., Reymond J.-L.. Prediction of chemical reaction yields using deep learning. Mach. Learn. 2021;2:015016. doi: 10.1088/2632-2153/abc81d. [DOI] [Google Scholar]
Pollice R., dos Passos Gomes G., Aldeghi M., Hickman R. J., Krenn M., Lavigne C., Lindner-D’Addario M., Nigam A., Ser C. T., Yao Z.. et al. Data-driven strategies for accelerated materials design. Acc. Chem. Res. 2021;54:849–860. doi: 10.1021/acs.accounts.0c00785. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rosen A. S., Iyer S. M., Ray D., Yao Z., Aspuru-Guzik A., Gagliardi L., Notestein J. M., Snurr R. Q.. Machine learning the quantum-chemical properties of metal–organic frameworks for accelerated materials discovery. Matter. 2021;4:1578–1597. doi: 10.1016/j.matt.2021.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ryan K., Lengyel J., Shatruk M.. Crystal structure prediction via deep learning. J. Am. Chem. Soc. 2018;140:10158–10168. doi: 10.1021/jacs.8b03913. [DOI] [PubMed] [Google Scholar]
Graser J., Kauwe S. K., Sparks T. D.. Machine learning and energy minimization approaches for crystal structure predictions: a review and new horizons. Chem. Mater. 2018;30:3601–3612. doi: 10.1021/acs.chemmater.7b05304. [DOI] [Google Scholar]
Vamathevan J., Clark D., Czodrowski P., Dunham I., Ferran E., Lee G., Li B., Madabhushi A., Shah P., Spitzer M.. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discovery. 2019;18:463–477. doi: 10.1038/s41573-019-0024-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lavecchia A.. Machine-learning approaches in drug discovery: methods and applications. Drug discovery today. 2015;20:318–331. doi: 10.1016/j.drudis.2014.10.012. [DOI] [PubMed] [Google Scholar]
Gupta R., Srivastava D., Sahu M., Tiwari S., Ambasta R. K., Kumar P.. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol. Diversity. 2021;25:1315–1360. doi: 10.1007/s11030-021-10217-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mueller T., Hernandez A., Wang C.. Machine learning for interatomic potential models. J. Chem. Phys. 2020;152:050902. doi: 10.1063/1.5126336. [DOI] [PubMed] [Google Scholar]
Chan H., Narayanan B., Cherukara M. J., Sen F. G., Sasikumar K., Gray S. K., Chan M. K., Sankaranarayanan S. K.. Machine learning classical interatomic potentials for molecular dynamics from first-principles training data. J. Phys. Chem. C. 2019;123:6941–6957. doi: 10.1021/acs.jpcc.8b09917. [DOI] [Google Scholar]
Gastegger M., Behler J., Marquetand P.. Machine learning molecular dynamics for the simulation of infrared spectra. Chem. Sci. 2017;8:6924–6935. doi: 10.1039/C7SC02267K. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee K., Yoo D., Jeong W., Han S.. SIMPLE-NN: An efficient package for training and executing neural-network interatomic potentials. Comput. Phys. Commun. 2019;242:95–103. doi: 10.1016/j.cpc.2019.04.014. [DOI] [Google Scholar]
Amin I., Raja S., Krishnapriyan A.. Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians. arXiv. 2025;2501:09009. doi: 10.48550/arXiv.2501.09009. [DOI] [Google Scholar]
Shimamura K., Koura A., Shimojo F.. Training method for refinement of machine-learning interatomic potential and its applications. Pathways. 2022;1:3. [Google Scholar]
Gao X., Ramezanghorbani F., Isayev O., Smith J. S., Roitberg A. E.. TorchANI: A Free and Open Source PyTorch-Based Deep Learning Implementation of the ANI Neural Network Potentials. J. Chem. Inf. Model. 2020;60:3408–3415. doi: 10.1021/acs.jcim.0c00451. [DOI] [PubMed] [Google Scholar]
Grambow C. A., Pattanaik L., Green W. H.. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Sci. Data. 2020;7:137. doi: 10.1038/s41597-020-0460-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Williams N. J., Kabalan L., Stojanovic L., Zólyomi V., Pyzer-Knapp E. O.. Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solvents. Sci. Data. 2025;12:9. doi: 10.1038/s41597-024-04361-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ruddigkeit L., Van Deursen R., Blum L. C., Reymond J.-L.. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 2012;52:2864–2875. doi: 10.1021/ci300415d. [DOI] [PubMed] [Google Scholar]
Ramakrishnan R., Dral P. O., Rupp M., Von Lilienfeld O. A.. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data. 2014;1:140022. doi: 10.1038/sdata.2014.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
Frisch, M. J. ; Trucks, G. W. ; Schlegel, H. B. ; Scuseria, G. E. ; Robb, M. A. ; Cheeseman, J. R. ; Scalmani, G. ; Barone, V. ; Petersson, G. A. ; Nakatsuji, H. ; Li, X. ; Caricato, M. ; Marenich, A. V. ; Bloino, J. ; Janesko, B. G. ; Gomperts, R. ; Mennucci, B. ; Hratchian, H. P. ; Ortiz, J. V. ; Izmaylov, A. F. ; Sonnenberg, J. L. ; Williams-Young, D. ; Ding, F. ; Lipparini, F. ; Egidi, F. ; Goings, J. ; Peng, B. ; Petrone, A. ; Henderson, T. ; Ranasinghe, D. ; Zakrzewski, V. G. ; Gao, J. ; Rega, N. ; Zheng, G. ; Liang, W. ; Hada, M. ; Ehara, M. ; Toyota, K. ; Fukuda, R. ; Hasegawa, J. ; Ishida, M. ; Nakajima, T. ; Honda, Y. ; Kitao, O. ; Nakai, H. ; Vreven, T. ; Throssell, K. ; Montgomery, J. A., Jr. ; Peralta, J. E. ; Ogliaro, F. ; Bearpark, M. J. ; Heyd, J. J. ; Brothers, E. N. ; Kudin, K. N. ; Staroverov, V. N. ; Keith, T. A. ; Kobayashi, R. ; Normand, J. ; Raghavachari, K. ; Rendell, A. P. ; Burant, J. C. ; Iyengar, S. S. ; Tomasi, J. ; Cossi, M. ; Millam, J. M. ; Klene, M. ; Adamo, C. ; Cammi, R. ; Ochterski, J. W. ; Martin, R. L. ; Morokuma, K. ; Farkas, O. ; Foresman, J. B. ; Fox, D. J. . Gaussian 16. Revision C.01; Gaussian Inc: Wallingford CT, 2016. [Google Scholar]
Grimme S.. Semiempirical GGA-type density functional constructed with a long-range dispersion correction. J. Comput. Chem. 2006;27:1787–1799. doi: 10.1002/jcc.20495. [DOI] [PubMed] [Google Scholar]
Grimme S.. Semiempirical hybrid density functional with perturbative second-order correlation. J. Chem. Phys. 2006;124:034108. doi: 10.1063/1.2148954. [DOI] [PubMed] [Google Scholar]
Minenkov Y., Singstad Å., Occhipinti G., Jensen V. R.. The accuracy of DFT-optimized geometries of functional transition metal compounds: a validation study of catalysts for olefin metathesis and other reactions in the homogeneous phase. Dalton Trans. 2012;41:5526–5541. doi: 10.1039/c2dt12232d. [DOI] [PubMed] [Google Scholar]
Ditchfield R., Hehre W. J., Pople J. A.. Self-consistent molecular-orbital methods. IX. An extended Gaussian-type basis for molecular-orbital studies of organic molecules. J. Chem. Phys. 1971;54:724–728. doi: 10.1063/1.1674902. [DOI] [Google Scholar]
Hehre W. J., Ditchfield R., Pople J. A.. Selfconsistent molecular orbital methods. XII. Further extensions of Gaussiantype basis sets for use in molecular orbital studies of organic molecules. J. Chem. Phys. 1972;56:2257–2261. doi: 10.1063/1.1677527. [DOI] [Google Scholar]
Hariharan P. C., Pople J. A.. The influence of polarization functions on molecular orbital hydrogenation energies. Theor. Chim. Acta. 1973;28:213–222. doi: 10.1007/BF00533485. [DOI] [Google Scholar]
Hariharan P., Pople J.. Accuracy of AH n equilibrium geometries by single determinant molecular orbital theory. Mol. Phys. 1974;27:209–214. doi: 10.1080/00268977400100171. [DOI] [Google Scholar]
Gordon M. S.. The isomers of silacyclopropane. Chem. Phys. Lett. 1980;76:163–168. doi: 10.1016/0009-2614(80)80628-2. [DOI] [Google Scholar]
Francl M. M., Pietro W. J., Hehre W. J., Binkley J. S., Gordon M. S., DeFrees D. J., Pople J. A.. Self-consistent molecular orbital methods. XXIII. A polarization-type basis set for second-row elements. J. Chem. Phys. 1982;77:3654–3665. doi: 10.1063/1.444267. [DOI] [Google Scholar]
Kruse H., Grimme S.. A geometrical correction for the inter-and intra-molecular basis set superposition error in Hartree-Fock and density functional theory calculations for large systems. J. Chem. Phys. 2012;136:154101. doi: 10.1063/1.3700154. [DOI] [PubMed] [Google Scholar]
Smith J. S., Isayev O., Roitberg A. E.. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017;8:3192–3203. doi: 10.1039/C6SC05720A. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith J. S., Isayev O., Roitberg A. E.. ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules. Sci. Data. 2017;4:170193. doi: 10.1038/sdata.2017.193. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith J. S., Nebgen B., Lubbers N., Isayev O., Roitberg A. E.. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 2018;148:241733. doi: 10.1063/1.5023802. [DOI] [PubMed] [Google Scholar]
Smith J. S., Zubatyuk R., Nebgen B., Lubbers N., Barros K., Roitberg A. E., Isayev O., Tretiak S.. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data. 2020;7:134. doi: 10.1038/s41597-020-0473-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Devereux C., Smith J. S., Huddleston K. K., Barros K., Zubatyuk R., Isayev O., Roitberg A. E.. Extending the Applicability of the ANI Deep Learning Molecular Potential to Sulfur and Halogens. J. Chem. Theory Comput. 2020;16:4192–4202. doi: 10.1021/acs.jctc.0c00121. [DOI] [PubMed] [Google Scholar]
Behler J., Parrinello M.. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 2007;98:146401. doi: 10.1103/physrevlett.98.146401. [DOI] [PubMed] [Google Scholar]
Behler J.. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 2011;134:074106. doi: 10.1063/1.3553717. [DOI] [PubMed] [Google Scholar]
Behler J.. Constructing high-dimensional neural network potentials: A tutorial review. Int. J. Quantum Chem. 2015;115:1032–1050. doi: 10.1002/qua.24890. [DOI] [Google Scholar]
Larsen A. H., Mortensen J. J., Blomqvist J., Castelli I. E., Christensen R., Du lak M., Friis J., Groves M. N., Hammer B., Hargus C.. et al. The atomic simulation environmenta Python library for working with atoms. J. Phys.: Condens. Matter. 2017;29:273002. doi: 10.1088/1361-648X/aa680e. [DOI] [PubMed] [Google Scholar]
Chmiela S., Tkatchenko A., Sauceda H. E., Poltavsky I., Schütt K. T., Müller K. R.. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 2017;3:e1603015. doi: 10.1126/sciadv.1603015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Christensen A. S., Von Lilienfeld O. A.. On the role of gradients for machine learning of molecular energies and forces. Mach. Learn. 2020;1:045018. doi: 10.1088/2632-2153/abba6f. [DOI] [Google Scholar]
Jónsson, H. ; Mills, G. ; Jacobsen, K. W. . Classical and Quantum Dynamics in Condensed Phase Simulations; World Scientific, 1998; pp 385–404. [Google Scholar]
Henkelman G., Jónsson H.. Improved tangent estimate in the nudged elastic band method for finding minimum energy paths and saddle points. J. Chem. Phys. 2000;113:9978–9985. doi: 10.1063/1.1323224. [DOI] [Google Scholar]
Henkelman G., Uberuaga B. P., Jónsson H.. A climbing image nudged elastic band method for finding saddle points and minimum energy paths. J. Chem. Phys. 2000;113:9901–9904. doi: 10.1063/1.1329672. [DOI] [Google Scholar]
Smidstrup S., Pedersen A., Stokbro K., Jónsson H.. Improved initial guess for minimum energy path calculations. J. Chem. Phys. 2014;140:214106. doi: 10.1063/1.4878664. [DOI] [PubMed] [Google Scholar]
Lindgren P., Kastlunger G., Peterson A. A.. Scaled and dynamic optimizations of nudged elastic bands. J. Chem. Theory Comput. 2019;15:5787–5793. doi: 10.1021/acs.jctc.9b00633. [DOI] [PubMed] [Google Scholar]
Fu X., Wu Z., Wang W., Xie T., Keten S., Gomez-Bombarelli R., Jaakkola T.. Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations. arXiv. 2022;2210:07237. doi: 10.48550/arXiv.2210.07237. [DOI] [Google Scholar]
Batatia I., Benner P., Chiang Y., Elena A. M., Kovács D. P., Riebesell J., Advincula X. R., Asta M., Avaylon M., Baldwin W. J.. et al. A foundation model for atomistic materials chemistry. arXiv. 2023;2401:00096. doi: 10.48550/arXiv.2401.00096. [DOI] [Google Scholar]
Anstine D. M., Zubatyuk R., Isayev O.. AIMNet2: a neural network potential to meet your neutral, charged, organic, and elemental-organic needs. Chem. Sci. 2025;16:10228–10244. doi: 10.1039/d4sc08572h. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith J. S., Lubbers N., Thompson A. P., Barros K.. Simple and efficient algorithms for training machine learning potentials to force data. arXiv. 2020;2006:05475. doi: 10.48550/arXiv.2006.05475. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ct5c00402_si_001.pdf^{(11.3MB, pdf)}

[ref1] Westermayr J., Gastegger M., Schütt K. T., Maurer R. J.. Perspective on integrating machine learning into computational chemistry and materials science. J. Chem. Phys. 2021;154:154. doi: 10.1063/5.0047760. [DOI] [PubMed] [Google Scholar]

[ref2] Gómez-Bombarelli R., Wei J. N., Duvenaud D., Hernández-Lobato J. M., Sánchez-Lengeling B., Sheberla D., Aguilera-Iparraguirre J., Hirzel T. D., Adams R. P., Aspuru-Guzik A.. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 2018;4:268–276. doi: 10.1021/acscentsci.7b00572. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] Nandy A., Duan C., Taylor M. G., Liu F., Steeves A. H., Kulik H. J.. Computational discovery of transition-metal complexes: from high-throughput screening to machine learning. Chem. Rev. 2021;121:9927–10000. doi: 10.1021/acs.chemrev.1c00347. [DOI] [PubMed] [Google Scholar]

[ref4] Back S., Yoon J., Tian N., Zhong W., Tran K., Ulissi Z. W.. Convolutional neural network of atomic surface structures to predict binding energies for high-throughput screening of catalysts. J. Phys. Chem. Lett. 2019;10:4401–4408. doi: 10.1021/acs.jpclett.9b01428. [DOI] [PubMed] [Google Scholar]

[ref5] Smith J. S., Nebgen B. T., Zubatyuk R., Lubbers N., Devereux C., Barros K., Tretiak S., Isayev O., Roitberg A. E.. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 2019;10:2903. doi: 10.1038/s41467-019-10827-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref6] Zuo Y., Chen C., Li X., Deng Z., Chen Y., Behler J., Csányi G., Shapeev A. V., Thompson A. P., Wood M. A.. et al. Performance and cost assessment of machine learning interatomic potentials. J. Phys. Chem. A. 2020;124:731–745. doi: 10.1021/acs.jpca.9b08723. [DOI] [PubMed] [Google Scholar]

[ref7] Kulichenko M., Smith J. S., Nebgen B., Li Y. W., Fedik N., Boldyrev A. I., Lubbers N., Barros K., Tretiak S.. The rise of neural networks for materials and chemical dynamics. J. Phys. Chem. Lett. 2021;12:6227–6243. doi: 10.1021/acs.jpclett.1c01357. [DOI] [PubMed] [Google Scholar]

[ref8] Batzner S., Musaelian A., Sun L., Geiger M., Mailoa J. P., Kornbluth M., Molinari N., Smidt T. E., Kozinsky B.. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 2022;13:2453. doi: 10.1038/s41467-022-29939-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] Musaelian A., Batzner S., Johansson A., Sun L., Owen C. J., Kornbluth M., Kozinsky B.. Learning local equivariant representations for large-scale atomistic dynamics. Nat. Commun. 2023;14:579. doi: 10.1038/s41467-023-36329-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] Behler J.. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem., Int. Ed. 2017;56:12828–12840. doi: 10.1002/anie.201703114. [DOI] [PubMed] [Google Scholar]

[ref11] Segler M. H., Preuss M., Waller M. P.. Planning chemical syntheses with deep neural networks and symbolic AI. Nature. 2018;555:604–610. doi: 10.1038/nature25978. [DOI] [PubMed] [Google Scholar]

[ref12] Coley C. W., Barzilay R., Jaakkola T. S., Green W. H., Jensen K. F.. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 2017;3:434–443. doi: 10.1021/acscentsci.7b00064. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] Kang P.-L., Liu Z.-P.. Reaction prediction via atomistic simulation: from quantum mechanics to machine learning. Iscience. 2021;24:102013. doi: 10.1016/j.isci.2020.102013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] Schwaller P., Vaucher A. C., Laino T., Reymond J.-L.. Prediction of chemical reaction yields using deep learning. Mach. Learn. 2021;2:015016. doi: 10.1088/2632-2153/abc81d. [DOI] [Google Scholar]

[ref15] Pollice R., dos Passos Gomes G., Aldeghi M., Hickman R. J., Krenn M., Lavigne C., Lindner-D’Addario M., Nigam A., Ser C. T., Yao Z.. et al. Data-driven strategies for accelerated materials design. Acc. Chem. Res. 2021;54:849–860. doi: 10.1021/acs.accounts.0c00785. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] Rosen A. S., Iyer S. M., Ray D., Yao Z., Aspuru-Guzik A., Gagliardi L., Notestein J. M., Snurr R. Q.. Machine learning the quantum-chemical properties of metal–organic frameworks for accelerated materials discovery. Matter. 2021;4:1578–1597. doi: 10.1016/j.matt.2021.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] Ryan K., Lengyel J., Shatruk M.. Crystal structure prediction via deep learning. J. Am. Chem. Soc. 2018;140:10158–10168. doi: 10.1021/jacs.8b03913. [DOI] [PubMed] [Google Scholar]

[ref18] Graser J., Kauwe S. K., Sparks T. D.. Machine learning and energy minimization approaches for crystal structure predictions: a review and new horizons. Chem. Mater. 2018;30:3601–3612. doi: 10.1021/acs.chemmater.7b05304. [DOI] [Google Scholar]

[ref19] Vamathevan J., Clark D., Czodrowski P., Dunham I., Ferran E., Lee G., Li B., Madabhushi A., Shah P., Spitzer M.. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discovery. 2019;18:463–477. doi: 10.1038/s41573-019-0024-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] Lavecchia A.. Machine-learning approaches in drug discovery: methods and applications. Drug discovery today. 2015;20:318–331. doi: 10.1016/j.drudis.2014.10.012. [DOI] [PubMed] [Google Scholar]

[ref21] Gupta R., Srivastava D., Sahu M., Tiwari S., Ambasta R. K., Kumar P.. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol. Diversity. 2021;25:1315–1360. doi: 10.1007/s11030-021-10217-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] Mueller T., Hernandez A., Wang C.. Machine learning for interatomic potential models. J. Chem. Phys. 2020;152:050902. doi: 10.1063/1.5126336. [DOI] [PubMed] [Google Scholar]

[ref23] Chan H., Narayanan B., Cherukara M. J., Sen F. G., Sasikumar K., Gray S. K., Chan M. K., Sankaranarayanan S. K.. Machine learning classical interatomic potentials for molecular dynamics from first-principles training data. J. Phys. Chem. C. 2019;123:6941–6957. doi: 10.1021/acs.jpcc.8b09917. [DOI] [Google Scholar]

[ref24] Gastegger M., Behler J., Marquetand P.. Machine learning molecular dynamics for the simulation of infrared spectra. Chem. Sci. 2017;8:6924–6935. doi: 10.1039/C7SC02267K. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] Lee K., Yoo D., Jeong W., Han S.. SIMPLE-NN: An efficient package for training and executing neural-network interatomic potentials. Comput. Phys. Commun. 2019;242:95–103. doi: 10.1016/j.cpc.2019.04.014. [DOI] [Google Scholar]

[ref26] Amin I., Raja S., Krishnapriyan A.. Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians. arXiv. 2025;2501:09009. doi: 10.48550/arXiv.2501.09009. [DOI] [Google Scholar]

[ref27] Shimamura K., Koura A., Shimojo F.. Training method for refinement of machine-learning interatomic potential and its applications. Pathways. 2022;1:3. [Google Scholar]

[ref28] Gao X., Ramezanghorbani F., Isayev O., Smith J. S., Roitberg A. E.. TorchANI: A Free and Open Source PyTorch-Based Deep Learning Implementation of the ANI Neural Network Potentials. J. Chem. Inf. Model. 2020;60:3408–3415. doi: 10.1021/acs.jcim.0c00451. [DOI] [PubMed] [Google Scholar]

[ref29] Grambow C. A., Pattanaik L., Green W. H.. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Sci. Data. 2020;7:137. doi: 10.1038/s41597-020-0460-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] Williams N. J., Kabalan L., Stojanovic L., Zólyomi V., Pyzer-Knapp E. O.. Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solvents. Sci. Data. 2025;12:9. doi: 10.1038/s41597-024-04361-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref31] Ruddigkeit L., Van Deursen R., Blum L. C., Reymond J.-L.. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 2012;52:2864–2875. doi: 10.1021/ci300415d. [DOI] [PubMed] [Google Scholar]

[ref32] Ramakrishnan R., Dral P. O., Rupp M., Von Lilienfeld O. A.. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data. 2014;1:140022. doi: 10.1038/sdata.2014.22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] Frisch, M. J. ; Trucks, G. W. ; Schlegel, H. B. ; Scuseria, G. E. ; Robb, M. A. ; Cheeseman, J. R. ; Scalmani, G. ; Barone, V. ; Petersson, G. A. ; Nakatsuji, H. ; Li, X. ; Caricato, M. ; Marenich, A. V. ; Bloino, J. ; Janesko, B. G. ; Gomperts, R. ; Mennucci, B. ; Hratchian, H. P. ; Ortiz, J. V. ; Izmaylov, A. F. ; Sonnenberg, J. L. ; Williams-Young, D. ; Ding, F. ; Lipparini, F. ; Egidi, F. ; Goings, J. ; Peng, B. ; Petrone, A. ; Henderson, T. ; Ranasinghe, D. ; Zakrzewski, V. G. ; Gao, J. ; Rega, N. ; Zheng, G. ; Liang, W. ; Hada, M. ; Ehara, M. ; Toyota, K. ; Fukuda, R. ; Hasegawa, J. ; Ishida, M. ; Nakajima, T. ; Honda, Y. ; Kitao, O. ; Nakai, H. ; Vreven, T. ; Throssell, K. ; Montgomery, J. A., Jr. ; Peralta, J. E. ; Ogliaro, F. ; Bearpark, M. J. ; Heyd, J. J. ; Brothers, E. N. ; Kudin, K. N. ; Staroverov, V. N. ; Keith, T. A. ; Kobayashi, R. ; Normand, J. ; Raghavachari, K. ; Rendell, A. P. ; Burant, J. C. ; Iyengar, S. S. ; Tomasi, J. ; Cossi, M. ; Millam, J. M. ; Klene, M. ; Adamo, C. ; Cammi, R. ; Ochterski, J. W. ; Martin, R. L. ; Morokuma, K. ; Farkas, O. ; Foresman, J. B. ; Fox, D. J. . Gaussian 16. Revision C.01; Gaussian Inc: Wallingford CT, 2016. [Google Scholar]

[ref34] Grimme S.. Semiempirical GGA-type density functional constructed with a long-range dispersion correction. J. Comput. Chem. 2006;27:1787–1799. doi: 10.1002/jcc.20495. [DOI] [PubMed] [Google Scholar]

[ref35] Grimme S.. Semiempirical hybrid density functional with perturbative second-order correlation. J. Chem. Phys. 2006;124:034108. doi: 10.1063/1.2148954. [DOI] [PubMed] [Google Scholar]

[ref36] Minenkov Y., Singstad Å., Occhipinti G., Jensen V. R.. The accuracy of DFT-optimized geometries of functional transition metal compounds: a validation study of catalysts for olefin metathesis and other reactions in the homogeneous phase. Dalton Trans. 2012;41:5526–5541. doi: 10.1039/c2dt12232d. [DOI] [PubMed] [Google Scholar]

[ref37] Ditchfield R., Hehre W. J., Pople J. A.. Self-consistent molecular-orbital methods. IX. An extended Gaussian-type basis for molecular-orbital studies of organic molecules. J. Chem. Phys. 1971;54:724–728. doi: 10.1063/1.1674902. [DOI] [Google Scholar]

[ref38] Hehre W. J., Ditchfield R., Pople J. A.. Selfconsistent molecular orbital methods. XII. Further extensions of Gaussiantype basis sets for use in molecular orbital studies of organic molecules. J. Chem. Phys. 1972;56:2257–2261. doi: 10.1063/1.1677527. [DOI] [Google Scholar]

[ref39] Hariharan P. C., Pople J. A.. The influence of polarization functions on molecular orbital hydrogenation energies. Theor. Chim. Acta. 1973;28:213–222. doi: 10.1007/BF00533485. [DOI] [Google Scholar]

[ref40] Hariharan P., Pople J.. Accuracy of AH n equilibrium geometries by single determinant molecular orbital theory. Mol. Phys. 1974;27:209–214. doi: 10.1080/00268977400100171. [DOI] [Google Scholar]

[ref41] Gordon M. S.. The isomers of silacyclopropane. Chem. Phys. Lett. 1980;76:163–168. doi: 10.1016/0009-2614(80)80628-2. [DOI] [Google Scholar]

[ref42] Francl M. M., Pietro W. J., Hehre W. J., Binkley J. S., Gordon M. S., DeFrees D. J., Pople J. A.. Self-consistent molecular orbital methods. XXIII. A polarization-type basis set for second-row elements. J. Chem. Phys. 1982;77:3654–3665. doi: 10.1063/1.444267. [DOI] [Google Scholar]

[ref43] Kruse H., Grimme S.. A geometrical correction for the inter-and intra-molecular basis set superposition error in Hartree-Fock and density functional theory calculations for large systems. J. Chem. Phys. 2012;136:154101. doi: 10.1063/1.3700154. [DOI] [PubMed] [Google Scholar]

[ref44] Smith J. S., Isayev O., Roitberg A. E.. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017;8:3192–3203. doi: 10.1039/C6SC05720A. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref45] Smith J. S., Isayev O., Roitberg A. E.. ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules. Sci. Data. 2017;4:170193. doi: 10.1038/sdata.2017.193. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref46] Smith J. S., Nebgen B., Lubbers N., Isayev O., Roitberg A. E.. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 2018;148:241733. doi: 10.1063/1.5023802. [DOI] [PubMed] [Google Scholar]

[ref47] Smith J. S., Zubatyuk R., Nebgen B., Lubbers N., Barros K., Roitberg A. E., Isayev O., Tretiak S.. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data. 2020;7:134. doi: 10.1038/s41597-020-0473-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref48] Devereux C., Smith J. S., Huddleston K. K., Barros K., Zubatyuk R., Isayev O., Roitberg A. E.. Extending the Applicability of the ANI Deep Learning Molecular Potential to Sulfur and Halogens. J. Chem. Theory Comput. 2020;16:4192–4202. doi: 10.1021/acs.jctc.0c00121. [DOI] [PubMed] [Google Scholar]

[ref49] Behler J., Parrinello M.. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 2007;98:146401. doi: 10.1103/physrevlett.98.146401. [DOI] [PubMed] [Google Scholar]

[ref50] Behler J.. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 2011;134:074106. doi: 10.1063/1.3553717. [DOI] [PubMed] [Google Scholar]

[ref51] Behler J.. Constructing high-dimensional neural network potentials: A tutorial review. Int. J. Quantum Chem. 2015;115:1032–1050. doi: 10.1002/qua.24890. [DOI] [Google Scholar]

[ref52] Larsen A. H., Mortensen J. J., Blomqvist J., Castelli I. E., Christensen R., Du lak M., Friis J., Groves M. N., Hammer B., Hargus C.. et al. The atomic simulation environmenta Python library for working with atoms. J. Phys.: Condens. Matter. 2017;29:273002. doi: 10.1088/1361-648X/aa680e. [DOI] [PubMed] [Google Scholar]

[ref53] Chmiela S., Tkatchenko A., Sauceda H. E., Poltavsky I., Schütt K. T., Müller K. R.. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 2017;3:e1603015. doi: 10.1126/sciadv.1603015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref54] Christensen A. S., Von Lilienfeld O. A.. On the role of gradients for machine learning of molecular energies and forces. Mach. Learn. 2020;1:045018. doi: 10.1088/2632-2153/abba6f. [DOI] [Google Scholar]

[ref55] Jónsson, H. ; Mills, G. ; Jacobsen, K. W. . Classical and Quantum Dynamics in Condensed Phase Simulations; World Scientific, 1998; pp 385–404. [Google Scholar]

[ref56] Henkelman G., Jónsson H.. Improved tangent estimate in the nudged elastic band method for finding minimum energy paths and saddle points. J. Chem. Phys. 2000;113:9978–9985. doi: 10.1063/1.1323224. [DOI] [Google Scholar]

[ref57] Henkelman G., Uberuaga B. P., Jónsson H.. A climbing image nudged elastic band method for finding saddle points and minimum energy paths. J. Chem. Phys. 2000;113:9901–9904. doi: 10.1063/1.1329672. [DOI] [Google Scholar]

[ref58] Smidstrup S., Pedersen A., Stokbro K., Jónsson H.. Improved initial guess for minimum energy path calculations. J. Chem. Phys. 2014;140:214106. doi: 10.1063/1.4878664. [DOI] [PubMed] [Google Scholar]

[ref59] Lindgren P., Kastlunger G., Peterson A. A.. Scaled and dynamic optimizations of nudged elastic bands. J. Chem. Theory Comput. 2019;15:5787–5793. doi: 10.1021/acs.jctc.9b00633. [DOI] [PubMed] [Google Scholar]

[ref60] Fu X., Wu Z., Wang W., Xie T., Keten S., Gomez-Bombarelli R., Jaakkola T.. Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations. arXiv. 2022;2210:07237. doi: 10.48550/arXiv.2210.07237. [DOI] [Google Scholar]

[ref61] Batatia I., Benner P., Chiang Y., Elena A. M., Kovács D. P., Riebesell J., Advincula X. R., Asta M., Avaylon M., Baldwin W. J.. et al. A foundation model for atomistic materials chemistry. arXiv. 2023;2401:00096. doi: 10.48550/arXiv.2401.00096. [DOI] [Google Scholar]

[ref62] Anstine D. M., Zubatyuk R., Isayev O.. AIMNet2: a neural network potential to meet your neutral, charged, organic, and elemental-organic needs. Chem. Sci. 2025;16:10228–10244. doi: 10.1039/d4sc08572h. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref63] Smith J. S., Lubbers N., Thompson A. P., Barros K.. Simple and efficient algorithms for training machine learning potentials to force data. arXiv. 2020;2006:05475. doi: 10.48550/arXiv.2006.05475. [DOI] [Google Scholar]

PERMALINK

Does Hessian Data Improve the Performance of Machine Learning Potentials?

Austin Rodriguez

Justin S Smith

Jose L Mendoza-Cortes

Abstract

1. Introduction

1.

2. Methodology

1. Inputs and Loss Targets Used in Training the MLIP Model.

2.1. Data Collection and Preparation

2.2. Hessian Matrix Incorporation

2.3. Machine Learning Model Adaptation

2.3.1. Force and Hessian Calculation Using Automatic Differentiation

2.3.2. Loss Function

2.4. Molecular Dynamics Simulations

2.4.1. Simulation Setup

2.4.2. Stability Failure Criteria

2.5. Computation of Reaction Pathways

2.5.1. Computational Setup

2.6. Calculation of Reaction Vibrational Spectra

3. Results and Discussion

3.1. Impact on Extrapolation

2.

3.2. Testing the Models on Held-Out Reactants, Transition States, and Products

2. Energy, Force, and Hessian RMSEs of Each Model Fitting Tested on 35,087 Reactants, Transition States, and Products in the Database.

3.3. Testing the Model on IRC Structures

3. Energy, Force, and Hessian RMSEs of Each Model Fitting Tested on 34,248 Structures in the IRCs of Around 574 Reactions (see Supporting Information, Figure S17).

3.4. Testing the Model on NMS Structures outside of the IRC Path

4. Energy, Force, and Hessian RMSEs of Ensembles of Each Model Fitting Tested on 62,527 Perturbed Structures Along the IRCs of Randomly Selected Reactions via NMS.

3.5. An Evaluation of Extrapolation via Stability in Molecular Dynamics Simulations

3.

3.6. NEB Analysis: Reaction Pathways and Barrier Predictions

4.

3.7. Vibrational Spectra as a Test of Hessian Accuracy

5.

3.8. Improved Data Efficiency

6.

4. Conclusion

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases