Abstract
We introduce a multiscale machine-learning molecular dynamics (MD) strategy for simulating infrared spectra of solvated molecules. Our approach integrates an efficient sampling of environmental configurations with a hierarchical model that predicts forces and dipole moments as analytical derivatives of the energy, allowing IR spectra simulations from MD trajectories. Solvent effects are incorporated through a molecular mechanics (MM) representation of the environment embedded within the ML description of the solute. Applied to representative biorelated systems, the resulting ML/MM framework reproduces experimental spectra with high fidelity and accurately captures solvent-driven vibrational shifts. This approach provides a computationally efficient and robust route for describing solvent effects in vibrational spectroscopy.


1. Introduction
Infrared (IR) spectroscopy provides a distinctive molecular fingerprint for identifying and characterizing chemical species, yet the growing complexity of modern experimental applications increasingly demands rigorous computational support. As advanced spectroscopic techniques probe larger and more complex systems, computational approaches have become indispensable for decoding, interpreting, and validating the measured vibrational signatures. −
The most widely used computational strategy for predicting IR spectra is based on the harmonic approximation of the potential energy surface (PES) around a local minimum obtained via geometry optimization. In this framework, vibrational frequencies and normal modes are derived from the Hessian matrix of the PES calculated with a selected quantum mechanical (QM) approach. Although this procedure can be a good starting point for IR spectra of isolated molecules, it intrinsically neglects the anharmonicities of the PES, typically accounted for by empirical scaling factors. , Extensions to solvated systems typically rely on hybrid schemes in which a QM treatment of the solute is embedded in a continuum solvation model, , occasionally supplemented with a small number of explicit solvent molecules to capture specific solute–solvent interactions. Such approaches, however, inherently neglect the influence of solvent dynamics, which cannot always be reliably averaged within a continuum representation. ,
An alternative approach to IR spectral predictions is provided by ab initio molecular dynamics (AIMD), which explicitly incorporates nuclear motion and its impact on the infrared response. − Also in this context, solvent effects are typically treated using a hybrid QM/classical scheme but now the environment is represented at the atomistic level through a molecular mechanics (MM) force field. ,− Using this strategy, solvent configurations are sampled during the simulation, and the IR spectrum can be obtained from the Fourier transform of the dipole–dipole autocorrelation function. Since this procedure avoids assumptions about the PES, it implicitly captures anharmonic effects to a certain degree. The main limitation of this “dynamical” strategy is the computational cost of the MD trajectory.
In recent years, machine learning (ML) has gained popularity in the acceleration of AIMD simulations. Indeed, by replacing the QM calculation with a ML prediction, simulations can be accelerated by several orders of magnitude, while retaining a comparable accuracy to the reference level of theory adopted for the training. This paved the way to the development of ML force fields (MLFF), − at first only for isolated molecules, and, more recently, in combination with molecular mechanics (ML/MM) to account for solvation effects through the electrostatic embedding scheme. −
Few attempts have been made to combine effective ML-based simulations with the prediction of IR spectra. , The missing piece, in this context, is the dipole moment prediction. Early approaches focused on isolated systems and employed separate models dedicated to the dipole moment, such as a neural network (NN) in refs − and a Gaussian process regression (GPR) model in refs , . This approach evolved with the development of NNs capable of directly predicting dipole moments alongside to energies and forces. − On this line, Gastegger et al. proposed a field-dependent NN model able to predict IR spectra in solvated systems.
In previous works, we introduced an electrostatic embedding ML/MM scheme based on GPR. ,, In this study, we extend this approach to predict solvent effect in IR spectra, evaluating dipole moments along ML/MM trajectories using analytically derived atomic partial charges.
A key challenge in this context is the data set construction: although AIMD offers optimal sampling of solvent configurations, its computational cost is substantial. For isolated molecules, variations in internal geometry are typically explored through displacements along normal modes. ,, However, a similar procedure is not suitable for sampling solvent configurations. Furthermore, our aim is to remain solvent-agnostic, since the environment is described through the electrostatic potential and is therefore transferablein principleacross solvents. To achieve this, we construct environment data sets by placing opportunely scaled random charges on layers built around the geometries obtained from the normal-mode displacement. These data sets can be directly employed or refined using geometries extracted from ML/MM simulations run using a preliminary model.
We apply our models and sampling strategy to compute gas-phase and solvated IR spectra of uracil, N-methylacetamide, and alanine dipeptide, demonstrating an accurate description of solvation effects.
2. Methods
A general scheme of the proposed ML/MM protocol for IR spectra in solution is shown in Figure . In the next subsections, we outline the methodology adopted to predict QM/MM energies, forces, and dipole moments, as well as the subsequent derivation of IR spectra from the dipole–dipole correlation. The final subsection presents the strategy developed to generate artificial solvent configurations for training the environment model.
1.
Overview of the proposed workflow. 1. Select a molecule; 2. Construct the data set for the isolated molecule via normal-mode displacements and use these isolated geometries as the basis for generating the environment data set, by placing charges on layers around the molecule; 3. Train the hierarchical model: one model for the vacuum contribution (targeting forces and, if required, energies), and another for the environment contribution (targeting forces, energies, and total charge); 4. Perform ML/MM MD simulations in different environmental conditions, predicting the dipole moment for each MD step; 5. Simulate the IR spectrum as the Fourier transform of the dipole autocorrelation function (in this case, we are using the time derivative of the dipole moment) and compare the spectra across solvents.
2.1. ML/MM Model for Energy, Forces, and Dipole Moment
Recently, we proposed a strategy to predict electrostatic embedding QM/MM energies and forces for ground and excited state using GPR. In this context, the QM/MM energy is written as
| 1 |
where E QM is the energy of the QM system in vacuum, E QM/MM describes the electrostatic interaction between MM charges and the QM density, and E QM accounts for the polarization of the QM region due to the MM charges. The last two contributions are gathered into E QM‑MM , representing the modifications of the QM energy due to the presence of the external environment. Our hierarchical prediction is then written as
| 2 |
where separate models are trained for the vacuum and environment contributions, and forces are obtained as the negative gradient of this expression with respect to atomic positions. Here, χ denotes the descriptors, Θ the hyperparameters of each model, and the hat denotes model estimates.
The vacuum component only depends on the changes in the internal geometry of the QM system. Therefore, we adopted the inverse distances (ID) matrix as descriptor:
| 3 |
where i runs over samples, and a, b run over QM atoms. Since χID,i,ab is symmetric for each sample i, only the off-diagonal elements of the matrix are retained. For this model we adopted the Matern kernel with ν = 2.5:
| 4 |
Training is performed only on forces, by solving the following linear system:
| 5 |
where
| 6 |
and ∇1 and ∇2 represent the derivatives with respect to the first and the second arguments of the kernel. Thus, the Hessian is defined as
| 7 |
For a new configuration χ *, the predicted forces and energies are
| 8 |
| 9 |
where the energy is predicted as the integral of force and is therefore defined up to a constant C, determined on the training set .
The solvent contribution in eq depends on both the QM internal geometry and external MM charges. Therefore, in addition to χ ID, we introduced the potential descriptor, computed as the electrostatic potential due to external MM charges at the QM positions:
| 10 |
where i runs over samples, a runs over QM atoms and m runs over MM atoms. For the direct effect of the solvent, we used a second-order polynomial kernel:
| 11 |
The internal and direct kernels were multiplicatively combined to obtain the full environment kernel:
| 12 |
In the previous work, the environment model was trained on energies and forces, targeting the following quantities:
| 13 |
The corresponding linear system and predictive expressions are reported in eqs –.
| 14 |
| 15 |
| 16 |
The expression we get for the energy in terms of the kernel is particularly convenient, because we can derive many physical properties as analytical derivatives of the energy. In particular, a component of the molecular dipole moment μ can be derived as
| 17 |
where λ = x, y, or z, and is a uniform static external electric field. As our descriptor encodes the electrostatic potential at QM atoms, instead of an electric field, we can conveniently apply the chain rule:
In this picture, the potential acts as a localized perturbation, and analogously to eq , the partial atomic charge q a can be defined as
| 18 |
where the derivative is taken while the potentials at all other atomic sites are kept fixed, thereby isolating the local electrostatic response associated with that atom. Assuming a standard formulation for a potential derived from a uniform electric field , the dipole moment can be written as
| 19 |
According to eq , the potential descriptor enters only in the environment contribution to the energy, therefore the partial charges are determined only by the environment model. From eq , it must also hold that ∑ a = 1 q a = Q, which is the total charge of the QM moiety.
Building upon eq , and to further impose the charge constraint ∑ a q a = Q during training, we include the total charge in the training, associated with a very small regularization parameter (σ q = 10–5). The linear system to be solved for the environment model becomes
| 20 |
where . For this new model, the predictions for energy, forces, and partial charges are given by
| 21 |
| 22 |
| 23 |
The dipole moment is then computed from the partial charges as
| 24 |
Note that the environment model can also provide the dipole moment in vacuum, obtained by evaluating the partial charges in V = 0.
We also introduced permutational symmetry, adopting the standard procedure with kernel methods. , Letting P p denote the permutation matrix for the pth possible permutation and S the number of permutations, the kernel is replaced by a symmetrized kernel, calculated as
| 25 |
and analogously for all derivatives. For instance, the Hessian kernel becomes
| 26 |
2.2. Infrared Spectrum Calculation
The infrared spectrum is obtained from the autocorrelation function of the dipole moment along the ML/MM simulation , :
| 27 |
where β = 1/k B T, c is the speed of light in vacuum, V is the volume, and <μ̇(0)μ̇(t)> is the autocorrelation function of the time-derivative of the dipole moment. Equation assumes the use of a harmonic quantum correction for the line shape. Ideally, the autocorrelation function should be computed on an infinite trajectory. To avoid “border” effects due to the finite time scale of computational simulations, the autocorrelation is multiplied by a Gaussian function of σ = 1.0 ps. The spectra reported in the Results section are normalized.
2.3. Generation of Environment Configurations
To sample environment configurations, we constructed a grid of points around each selected solute molecule generated with the Merz–Singh–Kollmann scheme. − Four layers of points were constructed around each configuration of the isolated molecule, setting the minimum distance from the molecule at 3 Å, interlayer distance at 2 Å and maximum cutoff of 11 Å. From the external point grid obtained, all points closer than 1.3 Å were discarded. The electrostatic potential of the molecule at these grid points was estimated from Mulliken charges (q a ) extracted from vacuum calculations:
| 28 |
From this grid, 3000 points were stochastically selected with probabilities , such that the retained points preferentially sampled regions of stronger electrostatic potential. Random numbers ξ ∈ [0, 1) were drawn, and the 3000 points corresponding to the smallest ratios ξ j /p j were selected. External charges were then assigned to each grid point by sampling 3000 values from a normal distribution . To suppress nonphysical interactions, a damping function was applied:
| 29 |
where the a parameter was empirically set to 0.1. In particular, we computed the final external charges as q̃ i = f damp(q i V i )/V i , which gives a rescaled value for the charge only for strongly interacting sites. To enforce charge neutrality, the mean of the external charges was subtracted.
On these artificial solute–solvent configurations, we performed electrostatic embedding QM/MM calculations of energy and forces.
3. Computational Details
3.1. Data Set Generation
We considered three molecular systems: uracil (Ura), as a relatively simple test case; N-methylacetamide (NMA), which has the same number of atoms but introduces the challenge of permutational symmetry in methyl groups; and alanine dipeptide (Ala2), chosen as a larger and more flexible molecule (see Figure ). For Ura and NMA, geometrical variations were obtained by applying random displacements along normal modes following a geometry optimization at the chosen reference level of theory. A vacuum calculation was then performed for each configuration. For Ala2, due to its conformational flexibility, we selected three stable aqueous conformers: αR, β, and PII (see Figure ). We optimized the three conformers (αR: Φ = −77°, Ψ = −21°), β: Φ = −159°, Ψ = 163°, and PII: Φ = −65°, Ψ = 147°), and generated 400 geometries per conformer by applying random normal-mode displacements. External charges were assigned following the procedure explained in Section . For each molecule, we generated a data set consisting of 1200 samples, and we trained the models on 1000 samples. The Python scripts used for data set generation, as well as the data sets and trained models, are available at 10.5281/zenodo.18391996. Vacuum and QM/MM calculations were performed at the ωB97XD/6-31G(d) level of theory using the Gaussian suite of programs, and on these data sets we trained the Vac and Env models for each molecule.
2.

Representation of the optimized geometries for the three molecules of interest: uracil, N-methylacetamide, and alanine dipeptide. For the latter, we considered the αR, β, and PII conformers.
Moreover, building on such base-model (M Base = Vac + Env), we adopted a Δ-learning strategy to correct the predictions targeting an higher level of theory. We chose the dispersion-corrected double-hybrid B2PLYP-D3/cc-pVTZ, which was found reliable for vibrational properties. In this framework, the Δ-learning models were trained to reproduce the corrections:
| 30 |
| 31 |
where the superscript “high-level” refers to energies and forces computed at the B2PLYP-D3/cc-pVTZ level of theory, whereas F̂ base‑model and Ê base‑model denote the predictions from the base-models. Separate Δ-learning models were trained for both the vacuum and the environment contributions. For Δ-learning models, a subset of 400 samples was extracted from the base data set and recomputed at the higher level.
Finally, for each molecule, 500 configurations were extracted from the second half of the NPT equilibration trajectory (see Section ) to be used as test set and we performed the corresponding reference calculations. For the Δ-learning test set, only 250 samples were considered.
3.2. Training Strategy
The models developed in this work were trained using our Python package GPX. The inverse distance and electrostatic potential descriptors were computed with Moldex Python package. Both packages were implemented in JAX and are available on GitHub under the GNU LGPL agreement.
As outlined in the Methods section, our general protocol consists of training vacuum models only on forces and environment models on energies, forces, and total charge. For the latter, we enforce the same regularization parameter for energies and forces (σ = σ e = σ f ), and fix the regularization parameter for total charge at σ q = 10–5. Also, the intercept parameter of the polynomial kernel was fixed at θ = 1. Thus, each model requires the tuning of two hyperparameters: the length scale λ of the Matern kernel and the regularization σ. The only exception is the vacuum model of Ala2, where training only on forces was insufficient to correctly discriminate between conformers in terms of energy. This was expected, since the data set contains geometries localized around the three minima and force-based training encodes only local information. Therefore, energies were included in the training, again with σ = σ e = σ f .
The hyperparameter optimization was carried out using grid-search and 4-fold cross-validation (CV). We tested λ = [10, 20, 30] and for vacuum models σ = [10–5, 10–4, 10–3], while for environment models σ = [10–4, 10–3, 10–2]. The use of higher σ values for environment is based on prior experience, as we previously observed that too small σ values for the environment model can lead to instabilities of the system during simulation. Learning curves for all the trained models can be found in the SI (Figures S1, S10, and S13).
3.3. Machine Learning Scores
For the validation of the models, we report the root mean squared error (RMSE), defined as follows:
| 32 |
where Ñ is the number of test points (N) for energies, Ñ = 3NN QM for forces, and Ñ = 3N for dipole moments. For dipole moments, we also report the mean absolute error (MAE), computed as
| 33 |
3.4. GPX-sire-OpenMM Interface
For performing ML/MM simulations, we extended the ML-server interface to integrate with sire, a molecular simulation framework that exploits OpenMM. This setup enables QM/MM dynamics using any external QM engine, including ML models. The interface is straightforward: sire requires only a callback function that takes as input QM coordinates, MM coordinates, and MM charges and returns energy, QM forces, and MM forces. As in our previous implementation with sander, pure MM forces and van der Waals interactions between QM and MM are computed by the molecular dynamics engine. A key advantage of the OpenMM-based approach is the GPU acceleration of the dynamics, which significantly improves performance given that the GPX software is written in JAX. Timings for all the performed simulations can be found in the SI, Table S1.
3.5. Molecular Dynamics Simulations
For Ura and NMA, MD inputs were prepared from optimized molecular geometries solvated in a truncated octahedron box of TIP3P water, extending 15 Å from the solute. In all simulations, TIP3P water was employed as a flexible model. We also solvated NMA with a truncated octahedron box of chloroform, extending 20 Å from the solute. We then performed a ML/MM minimization, followed by a 50 ps NPT equilibration at 300 K and 1 atm, using the Langevin thermostat (friction coefficient of 1 ps–1) and the Monte Carlo barostat. Production ML/MM simulations were carried out for 1 ns in the NVT ensemble using the Langevin thermostat. Electrostatics among MM charges was treated using Particle Mesh Ewald method, applying a cutoff of 12 Å, whereas all MM charges were explicitly included in the ML/MM electrostatic interaction. For simulations in vacuum, we performed a ML minimization, then a 50 ps NVT equilibration at 10 K for Ura and 20 K for NMA (resembling the conditions of experimental reference spectra), using the Langevin thermostat (friction coefficient of 0.1 ps–1). Production ML simulations were carried out for 1 ns in the NVT ensemble using the Langevin thermostat with the same friction coefficient adopted for the equilibration.
For Ala2, we considered two solvents: water and DMSO. Starting from the αR and PII optimized geometries employed for the training set generation, we constructed two octahedron boxes of TIP3P water (extending 15 Å from the solute), and DMSO molecules (extending 22 Å from the solute). The difference in box dimensions was chosen to yield a comparable number of MM atoms in each system. As for solvated Ura and NMA, we prepared the system through ML/MM minimization, followed by ML/MM NPT equilibration at 300 K and 1 atm for 50 ps. Subsequently, we ran 10 NVT replicas of 100 ps for each solvent and each conformation, retaining only those trajectories that preserved the corresponding conformation for IR spectrum generation.
Unless specified differently, all simulations were performed with sire (OpenMM) coupled to ML-server.
4. Results and Discussion
4.1. Uracil
After training on the artificial data set described in the Computational details section, we tested the vacuum and environment model on structures obtained from MD simulations of uracil in water (see Section ).
Table shows that the ML/MM model achieves robust accuracy, with prediction errors of roughly 2.3 kcal·mol–1 for energy and 1.6 kcal·mol–1·Å–1 for forces. An analysis of the individual contributions reveals that the dominant source of deviation stems from the environment model, which was trained on artificial charge configurations and subsequently transferred to water. Despite this, the dipole-moment error remains remarkably low (0.07 au), underscoring the reliability of the overall description.
1. Errors on Energy, Forces, and Dipole Moment Calculated on a Test Set of Aqueous Uracil .
| RMSE |
|||
|---|---|---|---|
| property | Vac | Env | Tot |
| energy (kcal·mol–1) | 0.02 | 2.30 | 2.29 |
| forces (kcal·mol–1·Å–1) | 0.27 | 1.55 | 1.58 |
| dipole gas-phase (a.u.) | 0.01 | ||
| dipole environment (a.u.) | 0.07 | ||
The second, third, and fourth columns indicate the error of the vacuum model (Vac), environment model (Env), and of the total ML/MM model (Tot = Vac + Env). In the bottom part, we report errors on the gas-phase and solvated dipole moments predicted with the Env model. The reference level of theory is ωB97XD/6-31G(d).
Figure shows the correlation plot of the dipole moment, where Ura is rotated such that its molecular plane lies parallel to the x–y plane. The predictions on the μ x and μ y components show a strong correlation with the reference values, whereas the μ z component is less accurately reproduced. This behavior is expected, as uracil is essentially planar. Deviations of the dipole arise only from environment-induced polarization. Because the dipole is approximated using charges restricted to the molecular plane, the out-of-plane component induced by the external charges cannot be properly captured. Nonetheless, this limitation is not expected to significantly affect the IR spectra.
3.

Correlation plot of the dipole moment computed on a test set of aqueous uracil. All the molecules were rotated to place the plane of uracil parallel to the x–y plane. The reference level of theory is ωB97XD/6-31G(d).
To test the model, we performed a first comparison between the IR spectrum obtained from a QM/MM simulation and the ML/MM simulation, using the same reference QM method adopted for training. Both simulations were initiated from identical coordinates and velocities and propagated for 10 ps using sander in combination with Gaussian and ML-server, to ensure that they were performed under exactly the same conditions.
Figure reports a direct comparison of the two IR spectra.
4.

Comparison of IR spectra of aqueous uracil computed with QM/MM (red) and ML/MM (teal) molecular dynamics. The starred peaks indicate vibrations of the TIP3P water solvent.
As we discussed in our previous work, even starting from the same point in the phase space, QM/MM and ML/MM trajectories will diverge after a few hundreds of femtoseconds, since small differences in forces can lead to not negligible structural deviations. Nevertheless, we observed that the frequencies of the geometrical oscillations remain comparable, which is also evident here: the spectra are essentially superimposable in terms of vibrational frequencies. The differences observed in peak intensities can be ascribed to the relatively short simulation time scale, which may cause the energy to be unevenly distributed among the normal modes in the two trajectories. To analyze the accuracy of the dipole predictions, we recomputed the IR spectrum along the QM/MM trajectory but using ML-predicted dipole moments (see Figure S7 of the SI). This spectrum closely matches the spectrum obtained with QM dipole moments, demonstrating the accuracy of ML prediction of the dipole moment. The power spectra of uracil for the ML/MM and QM/MM simulations are presented in Figure S3 of the SI. From this plot, the peaks at ∼2200 and ∼3800 cm–1, which appear in both QM/MM and ML/MM spectra, are not visible. These frequencies can be attributed to the water solvent, modeled here with a flexible TIP3P model (see Figure S4 in the SI). Although the dipole of water molecules is not considered in the calculation of the IR spectrum, our ML/MM model explicitly accounts for the polarization of the ML (QM) region due to the MM part. Consequently, fluctuations in the environment induce corresponding fluctuations in the uracil dipole moment, which are then reflected in the IR spectrum. Due to the unphysical positioning of water stretching and bending modes using TIP3P as flexible, we performed a simulation of aqueous uracil using the flexible SPC/Fw water model. As shown in Figure S5 of the SI, the SPC/Fw model shifts both solvent peaks toward lower wavenumbers; specifically, the peak at 2200 cm–1 vanishes, and the stretching mode appears at ∼3650 cm–1, offering a more realistic spectral representation. The power spectrum (Figure S4 in the SI) reveals the bending mode at ∼1500 cm–1, where it overlaps with uracil vibrations. Notably, the uracil-related peaks remain consistent across both water models, so we retained the TIP3P simulations, as the separation between solvent and solute vibrations allowed for a more straightforward interpretation.
As described in the Computational details, we also implemented a Δ-learning scheme to refine the predictions toward a higher level of theory (here, B2PLYP-D3/cc-pVTZ). Table presents the resulting errors in energies, forces, and dipole moments obtained with this correction strategy (see eq ). The errors are slightly higher than those obtained with the base-models, and this can be primarily attributed to the ΔEnv model.
2. Errors on Energy, Forces, and Dipole Moment Calculated on the Test Set of Aqueous Uracil Computed with M Base+Δ-models .
| RMSE |
|||
|---|---|---|---|
| property | Vac + ΔVac | Env + ΔEnv | Tot + ΔTot |
| energies (kcal·mol–1) | 0.02 | 2.86 | 2.86 |
| forces (kcal·mol–1·Å–1) | 0.25 | 1.86 | 1.89 |
| dipole gas-phase (a.u.) | 0.03 | ||
| dipole environment (a.u.) | 0.10 | ||
The second, third, and fourth columns indicate the error of vacuum models (Vac + Δ Vac), environment models (Env + Δ Env), and of the total ML/MM model (Tot + Δ Tot = Vac + Env + Δ Vac + Δ Env). In the bottom part, we report errors on the gas-phase and solvated dipole moments predicted with the Env + Δ Env model. The reference level of theory is B2PLYP-D3/cc-pVTZ.
To assess the impact of the ΔEnv model on the IR spectra, we performed two sets of ML/MM MD simulations of 1 ns. In the first, the Δ-learning correction was applied to both the vacuum and environment models, while in the second only the vacuum model was corrected with Δ-learning, and the environment was kept at the base-model level. Note that the corresponding QM/MM simulations at B2PLYP-D3/cc-pVTZ level would be extremely expensive (∼0.4 ps/day), whereas Δ-learning adds a negligible overhead to the simulations (see Table S1 in the Supporting Information). The corresponding solvated spectra for both cases are shown in Figure , and demonstrate that the contribution of the environment Δ-learning model is negligible.
5.

Comparison between spectra of aqueous uracil obtained from simulations in which we correct both vacuum and environment with the Δ-models (M Base + ΔVac + ΔEnv), and the case in which only the vacuum is corrected (M Base + ΔVac).
This finding is consistent with the small dipole-moment error already obtained using the environment base model for the B2PLYP-D3/cc-pVTZ reference (see Figure S2 in the SI). This observation justifies applying the correction exclusively to the vacuum model for the remaining systems. Accordingly, we define the corrected model as M Δ = M Base + ΔVac, where M Base and M Δ target the ωB97XD/6-31G(d) and the B2PLYP-D3/cc-pVTZ levels of theory, respectively.
Since harmonic spectra represent the standard approach for computing IR, we compared the ML/MM spectrum with the harmonic spectrum evaluated at the QM/continuum level using water as solvent. The continuum model used here is the Integral Equation Formalism of the Polarizable Continumm Model (from now on PCM) as implemented in Gaussian16.
The spectra reported in Figure exhibit good agreement in terms of vibrational frequencies. The main discrepancies appear at ∼870, ∼2200, and ∼3800 cm–1. As already discussed, the latter two modes arise from the flexible TIP3P water. A geometrical analysis links the peaks in the region between 500 and 1000 cm–1 to uracil wagging motions. We hypothesize that the out-of-plane dipole variation along these modes may not be accurately described by our model.
6.

Comparison of the IR spectra of aqueous uracil obtained from ML/MM simulations and harmonic QM/PCM calculations at the B2PLYP-D3/cc-pVTZ level of theory.
We finally compare our results with experimental IR spectra. Figure compares the gas-phase and solvated spectra calculated with the M Δ model to the experiments in water and in nitrogen matrix. Simulations of the isolated molecule were performed at 10 K, whereas aqueous uracil was simulated at 300 K, consistent with experimental conditions. Figure S6 in the SI compares the spectra generated from ML/MM simulations of aqueous uracil performed at 300 K using Langevin friction coefficients γL = 0.1 ps–1 and γL = 1 ps –1. The overlapping profiles demonstrate that the higher friction coefficient γL = 1 ps –1 does not introduce any artificial broadening.
7.

Comparison of IR spectra for gas-phase and aqueous uracil. Reference level of theory for the top panel is B2PLYP-D3/cc-pVTZ, whereas in the bottom panel, the experimental spectra are reported. The gas-phase spectra in different windows are taken from ref . and the aqueous one from ref . The * symbols indicate peaks that can be classified as overtones or resonance bands in the experimental vacuum spectrum.
The ML-predicted spectra generally show good agreement with the experimental results. The most notable discrepancies occur in the 1600–1800 cm–1 region, where the experimental gas-phase spectrum displays three intense peaks instead of two, arising from the two CO stretching modes. In this region, the uracil spectrum shows various Fermi resonances, which cause peak splitting. − A previous computational analysis of the experimental spectrum assigned the band at ∼1761 cm–1 to one carbonyl group, while the other two at ∼1704 and ∼1736 cm–1 were interpreted as combinations of the stretching vibration of the second carbonyl with other modes. This splitting complicates the quantitative evaluation of the experimental solvatochromic shift, as it prevents a clear identification of fundamental frequencies. These effects are also present in the aqueous spectrum and may contribute to the observed broadening. For our quantitative evaluation of the shift, we considered the carbonyl peak at the lowest wavenumber (∼1704 cm–1 in vacuum and ∼1637 cm–1 in water for the experiments).
The IR spectra and the corresponding quantitative shifts obtained with M Base are reported in Figures S8, S9, and in Table S2 in the Supporting Information. Both models correctly reproduce the red-shift of the carbonyl stretching mode (Δν1 at ∼1700 cm–1) and the blue-shift of the C–N motion (Δν2 at ∼1200 cm–1). Using M Δ, we obtained a red shift of 58 cm–1 for Δν1, compared to 67 cm–1 observed experimentally, and a blue shift of 25 cm–1 for Δν2, in good agreement with the experimental value of 30 cm–1. Furthermore, this higher-level model appears to provide a closer match to experimental peak shapes.
4.2. N-Methylacetamide
For N-methylacetamide (NMA), we trained both the base-models and the ΔVac correction, similarly to the case of uracil. The additional difficulty with this molecule stems from the presence of chemically equivalent atoms in the methyl groups, which were treated using a symmetrized kernel as explained in Section .
Table shows the errors for the vacuum and environment models, as well as for their sum. As in the previous case, the environment model exhibits a larger error, although it is still able to reproduce the gas-phase and QM/MM dipole moment with good accuracy. The last column of Table shows the error of the base+Δvacuum model with respect to the B2PLYP-D3/cc-pVTZ reference calculations in vacuum.
3. Errors on Energy, Forces, and Dipole Moment Calculated on a Test Set of Aqueous NMA .
| RMSE |
||||
|---|---|---|---|---|
| property | Vac | Env | M Base | Vac + ΔVac |
| energies (kcal · mol–1) | 0.16 | 1.42 | 1.45 | 0.14 |
| forces (kcal · mol–1 Å–1) | 0.61 | 2.17 | 2.18 | 0.62 |
| dipole gas-phase (a.u.) | 0.02 | |||
| dipole environment (a.u.) | 0.05 | |||
The second, third, and fourth columns indicate the error of the vacuum model (Vac), environment model (Env), and of the total ML/MM model (M Base = Vac + Env). In the bottom part, we report errors on the gas-phase and solvated dipole moments predicted with the Env model. These errors are computed with respect to the reference ωB97XD/6-31G(d) level. The last column represents the error of the base+Δ vacuum model with respect to the B2PLYP-D3/cc-pVTZ reference in vacuum.
In Figure we report the correlation of the dipole moments predicted by the Env model with the B2PLYP-D3/cc-pVTZ calculations. Also in this case the base-model predicts quite well the dipole moment at the higher level.
8.

Correlation plot of the dipole moment computed on a test set of aqueous N-methylacetamide. The prediction is made using the Env model and compared with reference calculations at the B2PLYP-D3/cc-pVTZ level.
As a further analysis, we compare the ML/MM spectrum of aqueous NMA with the QM/PCM harmonic approach (Figure ). To understand the effect of hydrogen bonding, we consider both pure PCM calculations and calculations including three explicit water molecules in the QM region (while the rest of the solvent is described as PCM). We included two H2O molecules hydrogen-bonded with the carbonyl group and one interacting with the N–H group. This is a challenging comparison: excluding explicit water neglects hydrogen bonding effects, while including them couples the water vibrational modes with those of NMA, generating peaks that include the effect of the dipole of explicit water molecules. Here we focus on the 1000–1900 cm–1 region. Overall, the QM/PCM harmonic spectrum is consistent with the ML/MM one, except in the carbonyl region, where the absence of hydrogen bonds causes a marked separation between the Amide I (∼1700 cm–1) and Amide II (∼1600 cm–1) bands. On the other hand, a quantum-mechanical description of the hydrogen bonds immediately close to the NMA molecule reduces the separation between the two peaks, highlighting the significant role of hydrogen bonding in the accurate description of this spectrum. The ML/MM approach yields results intermediate between the PCM-only case and the microsolvation case, suggesting that our scheme partially captures hydrogen-bonding effects but remains limited by the absence of mutual polarization between NMA and water molecules.
9.

Comparison of the IR spectrum obtained from ML/MM simulations of aqueous NMA, and the harmonic spectrum in PCM water and the one with three hydrogen-bonded water molecules in the QM part. The harmonic spectra are computed at the B2PLYP-D3/cc-pVTZ level.
Figure compares the IR spectra of NMA in gas-phase, water, and chloroform from M Δ model with the experiments. Simulations were performed at 20 K for vacuum and 300 K for ML/MM, also in this case matching experimental conditions.
10.

Comparison of IR spectra for NMA in gas-phase, chloroform, and water. Reference level of theory for the top panel is B2PLYP-D3/cc-pVTZ, whereas in the bottom panel, the experimental spectrum is reported. The gas-phase spectrum is taken from ref . while the aqueous and chloroform spectra are from refs and , respectively.
The predicted spectra in water well reproduce the main solvent effects observed in the experiments: namely, a red shift of the Amide I peak and a blue shift of the Amide II and Amide III peaks (∼1600 and ∼1300 cm–1, respectively). A more quantitative comparison (Table ) reveals that these shifts are slightly underestimated by ML/MM simulations. From the comparisons with PCM calculations detailed above, which highlighted the strong influence of the hydrogen-bond description on IR spectra, we can conclude that a purely electrostatic embedding does not fully capture the effect of H-bonds on these frequencies. Nonetheless, the agreement is almost quantitative. Similarly, the predicted spectrum in chloroform resembles the experimental one, although it is closer to the gas-phase spectrum than observed experimentally. The trend of the Amide I peak is correctly reproduced, and, as in water, its shift is slightly underestimated. In contrast, the Amide II and Amide III bands are almost superimposed to their vacuum counterparts. The discrepancy with the experimentally observed shift of the Amide II peak can be ascribed to the lack of solvent polarization, which plays an important role in describing the effects of low-polar solvents. Notably, the Amide II shift is difficult to determine quantitatively due to the peculiar shape of the experimental vacuum peak. IR spectra and shifts obtained with M Base are reported in Figures S11, S12, and in Table S3 in the Supporting Information.
4. Solvatochromic Shift (cm–1) of the IR Spectra of NMA in Water and Chloroform.
| water
(cm–1) |
chloroform
(cm–1) |
|||||
|---|---|---|---|---|---|---|
| Δν1 | Δν2 | Δν3 | Δν1 | Δν2 | Δν3 | |
| (Amide I) | (Amide II) | (Amide III) | ||||
| Exp. | –90 | 68 | 57 | –39 | 15 | |
| M Δ | –77 | 53 | 41 | –17 | –6 | –1 |
4.3. Alanine Dipeptide
Also for Ala2, base models and ΔVac model have been trained. Here, Vac and ΔVac models are trained on energies and forces, and Env model is trained on energies, forces and total charge. We employed the symmetrized kernels to account for the permutational invariance of hydrogen atoms in the three methyl groups. Table shows the errors of Vac and Env models, and their sum (M Base). The test set in this case contains frames extracted from the equilibration of both αR and PII trajectories and for both water and DMSO solvents. In the last column of the table, the error of the Vac+ΔVac model is also reported, with respect to the B2PLYP-D3/cc-pVTZ reference calculations in vacuum. Figure reports the correlation plot for the dipole moment of solvated Ala2 using as target the B2PLYP-D3/cc-pVTZ calculations.
5. Errors on Energy, Forces, and Dipole Moment Calculated on a Combined Test Set of Ala2 in Water and DMSO .
| RMSE |
||||
|---|---|---|---|---|
| property | Vac | Env | M Base | Vac + ΔVac |
| energies (kcal·mol–1) | 0.58 | 1.75 | 1.84 | 0.56 |
| forces (kcal·mol–1·Å–1) | 1.33 | 1.93 | 2.33 | 1.32 |
| dipole gas-phase (a.u.) | 0.06 | |||
| dipole environment (a.u.) | 0.11 | |||
The second, third, and fourth columns indicate the error of the vacuum model (Vac), environment model (Env), and of the total ML/MM model (M Base = Vac + Env). In the bottom part, we report errors on the gas-phase and solvated dipole moments predicted with the Env model. These errors are computed with respect to the reference ωB97XD/6-31G(d) level. The last column represents the error of the base+Δ vacuum model with respect to the B2PLYP-D3/cc-pVTZ reference in vacuum.
11.

Correlation plot of the dipole moment computed on a test set of solvated Ala2. The prediction is made using the Env model and compared with reference calculations at the B2PLYP-D3/cc-pVTZ level.
Ala2 errors are slightly higher than those observed for the other two molecules, but this can be primarily attributed to the vacuum model rather than the environment model. This is not a surprise, given the conformational flexibility of Ala2 and the bias of the vacuum data set toward optimized geometries and their local regions. A possible way to address this issue is active learning, , by performing an “explorative” simulation with the current models and extracting new geometries to enrich the data set. As the aim of our work is to evaluate the potential of our data set generation protocol and model independently of the solvent, we evaluate our model on ML/MM dynamics without refitting. To this end, we compare IR spectra simulated in two different solvents, DMSO and water.
The literature reports that Ala2 can adopt three preferential conformations in water, αR, β and PII, with their interconversion being barrierless (β to PII) or with a rather low barrier (β/PII to αR). , That is the reason why, consistently with a previous work on simulation of Ala2 IR spectrum using MD, we performed two sets of simulations (10 replicas each), starting from αR and PII conformations. The IR spectrum of αR was obtained by averaging trajectories that started and remained in this conformation, while the spectrum of PII/β was derived from trajectories that started from PII and remained in either the PII or the β conformations. Notably, all replicas in DMSO remained in their respective target conformation. Figures S14 and S15 in the SI show the Ala2 spectra in water and DMSO for the 1000–1900 cm–1 region, computed from the αR and PII/β trajectories. Also, in Figure S16 of the SI we report the 2d-histogram for the Φ and Ψ angles explored during those simulations. For comparison with experiments, the two spectra were combined with 1:5 weighting (αR: PII/β), as proposed in ref .
In Figure we compare the IR spectra (Amide I region) obtained with M Δ for Ala2 in water and DMSO with their experimental counterparts. The model is correctly capturing the solvatochromic shift, namely – 38 cm–1 with respect to the measured – 29 cm–1. The spectra and solvatochromic shift obtained with the M Base model are provided in Figure S17 and in Table S4 in the Supporting Information, and show similar trends.
12.

Comparison of IR spectra in the Amide I region for Ala2 in water and DMSO. Reference level of theory for the left panel is B2PLYP-D3/cc-pVTZ, whereas in the right panel the experimental spectrum is reported. The experimental spectra are taken from ref .
These results suggest that our ML/MM model can capture not only the effects of solvent polarity, but more in general the specific interactions that differentiate two polar solvents like water and DMSO. We recall that for both solvents we employed the same model, trained on artificial charge configurations. Our results thus indicate a good transferability of the ML/MM model across environments.
5. Conclusions
We have presented a general protocol for computing IR spectra of solvated molecules using ML/MM simulations within the electrostatic embedding framework. For any new molecule, the protocol begins with the construction of a vacuum data set, generated by applying random displacements along normal modes of optimized geometries at one or more minima. Optimally defined random charges are then placed around each isolated geometry, producing a solvent-agnostic data set suitable for environment training. Strikingly, this approach allows us to avoid solvent-specific simulations for extracting environment configurations. This strategy is completely general and would be particularly advantageous to generate large data sets, as in the case of training neural network models. ,,
In this work, both vacuum and environment model were trained using GPR, and subsequently employed for ML/MM simulations, where the environment model provided partial charges for dipole moment predictions at each MD step. The IR spectrum was then obtained as the Fourier transform of the time-derivative of the dipole–dipole correlation function.
Across all studied systems, our protocol successfully reproduced experimental solvatochromic shifts. Notably, the environment model could distinguish not only between two significantly different environments, such as gas-phase, chloroform, and water (as demonstrated for Ura and NMA), but also between two polar solvents (water and DMSO for Ala2) despite being trained without solvent-specific data. This highlights that the electrostatic potential serves as an effective, though approximate, descriptor for encoding the electrostatics of the environment. Furthermore, applying Δ-learning to correct only the vacuum model results in a negligible increase of the computational cost while yielding a clear improvement in the simulated IR spectra. The methodology presented here is particularly useful for studying the effect of different solvents on the same molecule, as kernel methods allow efficient model training with limited data (1000 samples per model in our case).
Future developments could involve more refined environment representations. For instance, incorporating the electric field in addition to the electrostatic potential would provide directional information about the environment. A current limitation is that kernel methods are molecule-specific. A significant future goal is the development of more general frameworks capable of incorporating environmental effects across multiple molecular systems.
Supplementary Material
Acknowledgments
L.C. and B.M. acknowledge financial support from ICSC-Centro Nazionale di Ricerca in high-performance computing, big data, and quantum computing, funded by the European Union-NextGenerationEU-PNRR, Missione 4 Componente 2 Investimento 1.4.
All data and scripts useful for this work are available in a Zenodo repository at 10.5281/zenodo.18391996. In particular, we have provided the data sets with both levels of theory, the Python scripts for generating new geometries (including normal-mode displacements and artificial environment configurations), the trained models for the three studied molecules, and the dipole moment data used for the IR spectra computations. Software: ML-server: https://github.com/Molecolab-Pisa/ML-server; GPX: https://github.com/Molecolab-Pisa/GPX (permut_symm branch); Moldex: https://github.com/Molecolab-Pisa/moldex.
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.5c01959.
Learning curves for all the trained models (uracil, NMA, and alanine dipeptide); correlation plot of the dipole moment computed with the base-models and compared with the B2PLYP-D3/cc-pVTZ level of theory (uracil); power spectra computed on QM/MM and ML/MM simulations (uracil); IR spectra and power spectra of a TIP3P and SPC/Fw water molecules during ML/MM simulations (uracil); IR spectra obtained with ML/MM simulations using two different Langevin friction coefficients (uracil); comparison between the IR spectra obtained from QM/MM and ML/MM dipole moments on the QM/MM trajectory (uracil); comparison of the IR spectra obtained with the base-models and Δ-learning corrected models with the corresponding harmonic spectra in PCM (uracil and NMA); IR spectra obtained with the base-models and Δ-learning corrected models averaged on αR and PII trajectories in water and DMSO (alanine dipeptide); 2d-histogram of the values of Ramachandran angles during the ML/MM simulations performed in water and DMSO(alanine dipeptide); comparison of the IR spectra obtained with the base-models and Δ-learning corrected models with the corresponding experimental spectra (uracil, NMA and alanine dipeptide); additional tables: timings for the performed ML/MM simulations; and quantitative solvatochromic shifts for base-models and Δ-learning corrected models compared with experimental ones for the three molecules (PDF)
The authors declare no competing financial interest.
References
- Blasiak B., Londergan C. H., Webb L. J., Cho M.. Vibrational Probes: From Small Molecule Solvatochromism Theory and Experiments to Applications in Complex Systems. Acc. Chem. Res. 2017;50:968–976. doi: 10.1021/acs.accounts.7b00002. [DOI] [PubMed] [Google Scholar]
- Morzan U. N., Alonso de Armino D. J., Foglia N. O., Ramirez F., Gonzalez Lebrero M. C., Scherlis D. A., Estrin D. A.. Spectroscopy in complex environments from QM–MM simulations. Chem. Rev. 2018;118:4071–4113. doi: 10.1021/acs.chemrev.8b00026. [DOI] [PubMed] [Google Scholar]
- Jansen T. L. C.. Computational spectroscopy of complex systems. J. Chem. Phys. 2021;155:170901. doi: 10.1063/5.0064092. [DOI] [PubMed] [Google Scholar]
- Barone V., Alessandrini S., Biczysko M., Cheeseman J. R., Clary D. C., McCoy A. B., DiRisio R. J., Neese F., Melosso M., Puzzarini C.. Computational Molecular Spectroscopy. Nat. Rev. Methods Primers. 2021;1:38. doi: 10.1038/s43586-021-00034-1. [DOI] [Google Scholar]
- Borowski P.. An evaluation of scaling factors for multiparameter scaling procedures based on DFT force fields. J. Phys. Chem. A. 2012;116:3866–3880. doi: 10.1021/jp212201f. [DOI] [PubMed] [Google Scholar]
- Merrick J. P., Moran D., Radom L.. An evaluation of harmonic vibrational frequency scale factors. J. Phys. Chem. A. 2007;111:11683–11700. doi: 10.1021/jp073974n. [DOI] [PubMed] [Google Scholar]
- Cramer C. J., Truhlar D. G.. Implicit Solvation Models: Equilibria, Structure, Spectra, and Dynamics. Chem. Rev. 1999;99:2161–2200. doi: 10.1021/cr960149m. [DOI] [PubMed] [Google Scholar]
- Tomasi J., Mennucci B., Cammi R.. Quantum Mechanical Continuum Solvation Models. Chem. Rev. 2005;105:2999–3094. doi: 10.1021/cr9904009. [DOI] [PubMed] [Google Scholar]
- Mennucci B.. Modeling environment effects on spectroscopies through QM/classical models. Phys. Chem. Chem. Phys. 2013;15:6583. doi: 10.1039/C3CP44417A. [DOI] [PubMed] [Google Scholar]
- Giovannini T., Cappelli C.. Continuum vs. atomistic approaches to computational spectroscopy of solvated systems. Chem. Commun. 2023;59:5644–5660. doi: 10.1039/D2CC07079K. [DOI] [PubMed] [Google Scholar]
- Thomas M., Brehm M., Fligg R., Vöhringer P., Kirchner B.. Computing vibrational spectra from ab initio molecular dynamics. Phys. Chem. Chem. Phys. 2013;15:6608–6622. doi: 10.1039/c3cp44302g. [DOI] [PubMed] [Google Scholar]
- Gaigeot M.-P.. Some opinions on MD-based vibrational spectroscopy of gas phase molecules and their assembly: An overview of what has been achieved and where to go. Spectrochim. Acta. A Mol. Biomol. Spectrosc. 2021;260:119864. doi: 10.1016/j.saa.2021.119864. [DOI] [PubMed] [Google Scholar]
- Ditler E., Luber S.. Vibrational spectroscopy by means of first-principles molecular dynamics simulations. WIREs Comput. Mol. Sci. 2022;12:e1605. doi: 10.1002/wcms.1605. [DOI] [Google Scholar]
- Taherivardanjani S., Elfgen R., Reckien W., Suarez E., Perlt E., Kirchner B.. Benchmarking the computational costs and quality of vibrational spectra from ab initio simulations. Adv. Theory Simul. 2022;5:2100293. doi: 10.1002/adts.202100293. [DOI] [Google Scholar]
- Macaluso V., Hashem S., Nottoli M., Lipparini F., Cupellini L., Mennucci B.. Ultrafast transient infrared spectroscopy of photoreceptors with polarizable QM/MM dynamics. J. Phys. Chem. B. 2021;125:10282–10292. doi: 10.1021/acs.jpcb.1c05753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nottoli M., Bondanza M., Mazzeo P., Cupellini L., Curutchet C., Loco D., Lagardère L., Piquemal J., Mennucci B., Lipparini F.. QM/AMOEBA description of properties and dynamics of embedded molecules. WIREs Comput. Mol. Sci. 2023;13:e1674. doi: 10.1002/wcms.1674. [DOI] [Google Scholar]
- Bovi D., Mezzetti A., Vuilleumier R., Gaigeot M.-P., Chazallon B., Spezia R., Guidoni L.. Environmental effects on vibrational properties of carotenoids: experiments and calculations on peridinin. Phys. Chem. Chem. Phys. 2011;13:20954–20964. doi: 10.1039/c1cp21985e. [DOI] [PubMed] [Google Scholar]
- Smith J. S., Isayev O., Roitberg A. E.. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017;8:3192–3203. doi: 10.1039/C6SC05720A. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chmiela S., Sauceda H. E., Poltavsky I., Müller K.-R., Tkatchenko A.. sGDML: Constructing accurate and data efficient molecular force fields using machine learning. Comput. Phys. Commun. 2019;240:38–45. doi: 10.1016/j.cpc.2019.02.007. [DOI] [Google Scholar]
- Unke O. T., Chmiela S., Sauceda H. E., Gastegger M., Poltavsky I., Schutt K. T., Tkatchenko A., Muller K.-R.. Machine learning force fields. Chem. Rev. 2021;121:10142–10186. doi: 10.1021/acs.chemrev.0c01111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poltavsky I., Tkatchenko A.. Machine learning force fields: Recent advances and remaining challenges. J. Phys. Chem. Lett. 2021;12:6551–6564. doi: 10.1021/acs.jpclett.1c01204. [DOI] [PubMed] [Google Scholar]
- Pinheiro M., Ge F., Ferré N., Dral P. O., Barbatti M.. Choosing the right molecular machine learning potential. Chem. Sci. 2021;12:14396–14413. doi: 10.1039/D1SC03564A. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dral P. O., Barbatti M.. Molecular excited states through a machine learning lens. Nat. Rev. Chem. 2021;5:388–405. doi: 10.1038/s41570-021-00278-1. [DOI] [PubMed] [Google Scholar]
- Zubatiuk T., Isayev O.. Development of multimodal machine learning potentials: toward a physics-aware artificial intelligence. Acc. Chem. Res. 2021;54:1575–1585. doi: 10.1021/acs.accounts.0c00868. [DOI] [PubMed] [Google Scholar]
- Batatia I., Kovacs D. P., Simm G., Ortner C., Csányi G.. MACE: Higher order equivariant message passing neural networks for fast and accurate force fields. Adv. Neural Inf. Process. 2022;35:11423–11436. [Google Scholar]
- Hou Y.-F., Ge F., Dral P. O.. Explicit learning of derivatives with the KREG and pKREG models on the example of accurate representation of molecular potential energy surfaces. J. Chem. Theory Comput. 2023;19:2369–2379. doi: 10.1021/acs.jctc.2c01038. [DOI] [PubMed] [Google Scholar]
- Hofstetter A., Böselt L., Riniker S.. Graph-convolutional neural networks for (QM) ML/MM molecular dynamics simulations. Phys. Chem. Chem. Phys. 2022;24:22497–22512. doi: 10.1039/D2CP02931F. [DOI] [PubMed] [Google Scholar]
- Mazzeo P., Cignoni E., Arcidiacono A., Cupellini L., Mennucci B.. Electrostatic embedding machine learning for ground and excited state molecular dynamics of solvated molecules. Digit. Discovery. 2024;3:2560–2571. doi: 10.1039/D4DD00295D. [DOI] [Google Scholar]
- Zinovjev K., Hedges L., Montagud Andreu R., Woods C., Tuñón I., van der Kamp M. W.. emle-engine: A flexible electrostatic machine learning embedding package for multiscale molecular dynamics simulations. J. Chem. Theory Comput. 2024;20:4514–4522. doi: 10.1021/acs.jctc.4c00248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grassano J. S., Pickering I., Roitberg A. E., Gonzalez Lebrero M. C., Estrin D. A., Semelak J. A.. Assessment of embedding schemes in a hybrid machine learning/classical potentials (ML/MM) approach. J. Chem. Inf. Model. 2024;64:4047–4058. doi: 10.1021/acs.jcim.4c00478. [DOI] [PubMed] [Google Scholar]
- Semelak J. A., Pickering I., Huddleston K., Olmos J., Grassano J. S., Clemente C. M., Drusin S. I., Marti M., Gonzalez Lebrero M. C., Roitberg A. E., Estrin D. A.. Advancing Multiscale Molecular Modeling with Machine Learning-Derived Electrostatics. J. Chem. Theory Comput. 2025;21:5194–5207. doi: 10.1021/acs.jctc.4c01792. [DOI] [PubMed] [Google Scholar]
- Song G., Yang W.. NepoIP/MM: Toward Accurate Biomolecular Simulation with a Machine Learning/Molecular Mechanics Model Incorporating Polarization Effects. J. Chem. Theory Comput. 2025;21:5588–5598. doi: 10.1021/acs.jctc.5c00372. [DOI] [PubMed] [Google Scholar]
- Tiefenbacher M. X., Bachmair B., Chen C. G., Westermayr J., Marquetand P., Dietschreit J. C., González L.. Excited-state nonadiabatic dynamics in explicit solvent using machine learned interatomic potentials. Digit. Discovery. 2025;4:1478–1491. doi: 10.1039/D5DD00044K. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barrett, R. ; Dietschreit, J. C. B. ; Westermayr, J. . Incorporating Long-Range Interactions via the Multipole Expansion into Ground and Excited-State Molecular Simulations, 2025; https://arxiv.org/abs/2502.21045.
- Pickering I., Semelak J. A., Xue J., Roitberg A. E.. TorchANI-Amber: Bridging neural network potentials and classical biomolecular simulations. J. Phys. Chem. B. 2025;129:11927–11938. doi: 10.1021/acs.jpcb.5c05725. [DOI] [PubMed] [Google Scholar]
- Grassano J. S., Pickering I., Roitberg A. E., Estrin D. A., Semelak J. A.. From QM/MM to ML/MM: A new era in multiscale modeling. Chem. Phys. Rev. 2025;6:041304. doi: 10.1063/5.0260078. [DOI] [Google Scholar]
- Zinovjev K., Curutchet C.. Improved description of environment and vibronic effects with electrostatically embedded ML potentials. J. Phys. Chem. Lett. 2025;16:774–781. doi: 10.1021/acs.jpclett.4c02949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han R., Ketkaew R., Luber S.. A concise review on recent developments of machine learning for the prediction of vibrational spectra. J. Phys. Chem. A. 2022;126:801–812. doi: 10.1021/acs.jpca.1c10417. [DOI] [PubMed] [Google Scholar]
- Westermayr J., Marquetand P.. Machine learning spectroscopy to advance computation and analysis. Chem. Sci. 2025;16:21660. doi: 10.1039/D5SC05628D. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gastegger M., Behler J., Marquetand P.. Machine learning molecular dynamics for the simulation of infrared spectra. Chem. Sci. 2017;8:6924–6935. doi: 10.1039/C7SC02267K. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y., Ye S., Zhang J., Hu C., Jiang J., Jiang B.. Efficient and accurate simulations of vibrational and electronic spectra with symmetry-preserving neural network models for tensorial properties. J. Phys. Chem. B. 2020;124:7284–7290. doi: 10.1021/acs.jpcb.0c06926. [DOI] [PubMed] [Google Scholar]
- Schienbein P.. Spectroscopy from machine learning by accurately representing the atomic polar tensor. J. Chem. Theory and Comput. 2023;19:705–712. doi: 10.1021/acs.jctc.2c00788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu N., Rosander P., Schäfer C., Lindgren E., Österbacka N., Fang M., Chen W., He Y., Fan Z., Erhart P.. Tensorial properties via the neuroevolution potential framework: Fast simulation of infrared and Raman spectra. J. Chem. Theory and Comput. 2024;20:3273–3284. doi: 10.1021/acs.jctc.3c01343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Litman Y., Behler J., Rossi M.. Temperature dependence of the vibrational spectrum of porphycene: a qualitative failure of classical-nuclei molecular dynamics. Faraday Discuss. 2020;221:526–546. doi: 10.1039/C9FD00056A. [DOI] [PubMed] [Google Scholar]
- Veit M., Wilkins D. M., Yang Y., DiStasio R. A., Ceriotti M.. Predicting molecular dipole moments by combining atomic partial charges and atomic dipoles. J. Chem. Phys. 2020;153:024113. doi: 10.1063/5.0009106. [DOI] [PubMed] [Google Scholar]
- Unke O. T., Meuwly M.. PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 2019;15:3678–3693. doi: 10.1021/acs.jctc.9b00181. [DOI] [PubMed] [Google Scholar]
- Chen Y., Pios S. V., Gelin M. F., Chen L.. Accelerating molecular vibrational spectra simulations with a physically informed deep learning model. J. Chem. Theory Comput. 2024;20:4703–4710. doi: 10.1021/acs.jctc.4c00173. [DOI] [PubMed] [Google Scholar]
- Kabylda A., Frank J. T., Suárez-Dou S., Khabibrakhmanov A., Medrano Sandonas L., Unke O. T., Chmiela S., Müller K. R., Tkatchenko A.. Molecular simulations with a pretrained neural network and universal pairwise force fields. J. Am. Chem. Soc. 2025;147:33723–33734. doi: 10.1021/jacs.5c09558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhatia, N. ; Krejci, O. ; Botti, S. ; Rinke, P. ; Marques, M. A. . MACE4IR: A foundation model for molecular infrared spectroscopy. arXiv preprint arXiv:2508.19118, 2025.
- Gastegger M., Schütt K. T., Müller K.-R.. Machine learning of solvent effects on molecular spectra and reactions. Chem. Sci. 2021;12:11473–11483. doi: 10.1039/D1SC02742E. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arcidiacono A., Cignoni E., Mazzeo P., Cupellini L., Mennucci B.. Predicting Solvatochromism of Chromophores in Proteins through QM/MM and Machine Learning. J. Phys. Chem. A. 2024;128:3646–3658. doi: 10.1021/acs.jpca.4c00249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cignoni E., Cupellini L., Mennucci B.. Machine learning exciton Hamiltonians in light-harvesting complexes. J. Chem. Theory Comput. 2023;19:965–977. doi: 10.1021/acs.jctc.2c01044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoja J., Medrano Sandonas L., Ernst B. G., Vazquez-Mayagoitia A., DiStasio R. A., Tkatchenko A.. QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules. Sci. Data. 2021;8:43. doi: 10.1038/s41597-021-00812-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ganscha S., Unke O. T., Ahlin D., Maennel H., Kashubin S., Müller K. R.. The QCML dataset, Quantum chemistry reference data from 33.5 M DFT and 14.7 B semi-empirical calculations. Sci. Data. 2025;12:406. doi: 10.1038/s41597-025-04720-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramírez R., López-Ciudad T., Kumar P P., Marx D.. Quantum corrections to classical time-correlation functions: Hydrogen bonding and anharmonic floppy modes. J. Chem. Phys. 2004;121:3973–3983. doi: 10.1063/1.1774986. [DOI] [PubMed] [Google Scholar]
- Valleau S., Eisfeld A., Aspuru-Guzik A.. On the alternatives for bath correlators and spectral densities from mixed quantum-classical simulations. J. Chem. Phys. 2012;137:224103. doi: 10.1063/1.4769079. [DOI] [PubMed] [Google Scholar]
- Singh U. C., Kollman P. A.. An approach to computing electrostatic charges for molecules. J. Comput. Chem. 1984;5:129–145. doi: 10.1002/jcc.540050204. [DOI] [Google Scholar]
- Besler B. H., Merz K. M. Jr, Kollman P. A.. Atomic charges derived from semiempirical methods. J. Comput. Chem. 1990;11:431–439. doi: 10.1002/jcc.540110404. [DOI] [Google Scholar]
- Jang, H. ; Bayly, C. ; Wang, L.-P. respyte. 2018; https://github.com/lpwgroup/respyte, (accessed November 18, 2025).
- Mulliken R. S.. Electronic population analysis on LCAO–MO molecular wave functions. I. J. Chem. Phys. 1955;23:1833–1840. doi: 10.1063/1.1740588. [DOI] [Google Scholar]
- Gaigeot M.-P.. Infrared spectroscopy of the alanine dipeptide analog in liquid water with DFT-MD. Direct evidence for PII/β conformations. Phys. Chem. Chem. Phys. 2010;12:10198–10209. doi: 10.1039/c003485a. [DOI] [PubMed] [Google Scholar]
- Chai J.-D., Head-Gordon M.. Long-range corrected hybrid density functionals with damped atom–atom dispersion corrections. Phys. Chem. Chem. Phys. 2008;10:6615–6620. doi: 10.1039/b810189b. [DOI] [PubMed] [Google Scholar]
- Hehre W. J., Ditchfield R., Pople J. A.. Self-Consistent Molecular Orbital Methods. XII. Further Extensions of Gaussian-Type Basis Sets for Use in Molecular Orbital Studies of Organic Molecules. J. Chem. Phys. 1972;56:2257–2261. doi: 10.1063/1.1677527. [DOI] [Google Scholar]
- Frisch, M. J. ; Trucks, G. W. ; Schlegel, H. B. ; Scuseria, G. E. ; Robb, M. A. ; Cheeseman, J. R. ; Scalmani, G. ; Barone, V. ; Petersson, G. A. ; Nakatsuji, H. ; Li, X. ; Caricato, M. ; Marenich, A. V. ; Bloino, J. ; Janesko, B. G. ; Gomperts, R. ; Mennucci, B. ; Hratchian, H. P. ; Ortiz, J. V. ; Izmaylov, A. F. ; Sonnenberg, J. L. ; Williams-Young, D. ; Ding, F. ; Lipparini, F. ; Egidi, F. ; Goings, J. ; Peng, B. ; Petrone, A. ; Henderson, T. ; Ranasinghe, D. ; Zakrzewski, V. G. ; Gao, J. ; Rega, N. ; Zheng, G. ; Liang, W. ; Hada, M. ; Ehara, M. ; Toyota, K. ; Fukuda, R. ; Hasegawa, J. ; Ishida, M. ; Nakajima, T. ; Honda, Y. ; Kitao, O. ; Nakai, H. ; Vreven, T. ; Throssell, K. ; Montgomery, J. A., Jr. ; Peralta, J. E. ; Ogliaro, F. ; Bearpark, M. J. ; Heyd, J. J. ; Brothers, E. N. ; Kudin, K. N. ; Staroverov, V. N. ; Keith, T. A. ; Kobayashi, R. ; Normand, J. ; Raghavachari, K. ; Rendell, A. P. ; Burant, J. C. ; Iyengar, S. S. ; Tomasi, J. ; Cossi, M. ; Millam, J. M. ; Klene, M. ; Adamo, C. ; Cammi, R. ; Ochterski, J. W. ; Martin, R. L. ; Morokuma, K. ; Farkas, O. ; Foresman, J. B. ; Fox, D. J. . Gaussian 1∼6 Revision C.01; Gaussian Inc: Wallingford CT, 2016. [Google Scholar]
- Grimme S., Ehrlich S., Goerigk L.. Effect of the damping function in dispersion corrected density functional theory. J. Comput. Chem. 2011;32:1456–1465. doi: 10.1002/jcc.21759. [DOI] [PubMed] [Google Scholar]
- Goerigk L., Grimme S.. Efficient and Accurate Double-Hybrid-Meta-GGA Density Functionals-Evaluation with the Extended GMTKN30 Database for General Main Group Thermochemistry, Kinetics, and Noncovalent Interactions. J. Chem. Theory Comput. 2011;7:291–309. doi: 10.1021/ct100466k. [DOI] [PubMed] [Google Scholar]
- Dunning T. H.. Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen. J. Chem. Phys. 1989;90:1007–1023. doi: 10.1063/1.456153. [DOI] [Google Scholar]
- Vazart F., Latouche C., Cimino P., Barone V.. Accurate Infrared (IR) Spectra for Molecules Containing the CN Moiety by Anharmonic Computations with the Double Hybrid B2PLYP Density Functional. J. Chem. Theory Comput. 2015;11:4364–4369. doi: 10.1021/acs.jctc.5b00638. [DOI] [PubMed] [Google Scholar]
- Cignoni, E. ; Mazzeo, P. ; Arcidiacono, A. ; Cupellini, L. ; Mennucci, B. . GPX: Gaussian Process Regression in JAX. 2023; https://github.com/Molecolab-Pisa/GPX, (accessed August 25, 2025).
- Arcidiacono, A. ; Mazzeo, P. ; Cignoni, E. ; Cupellini, L. ; Mennucci, B. . Moldex: molecular descriptors in JAX. 2023; https://github.com/Molecolab-Pisa/moldex (accessed August 25, 2025).
- Bradbury, J. ; Frostig, R. ; Hawkins, P. ; Johnson, M. J. ; Leary, C. ; Maclaurin, D. ; Necula, G. ; Paszke, A. ; VanderPlas, J. ; Wanderman-Milne, S. ; Zhang, Q. . JAX: composable transformations of Python+NumPy programs. 2018; http://github.com/google/jax, (accessed August 25, 2025).
- Mazzeo, P. ; Cignoni, E. ; Cupellini, L. ; Mennucci, B. . ML-server. 2024; https://github.com/Molecolab-Pisa/ML-server (accessed August 25, 2025).
- Woods C. J., Hedges L. O., Mulholland A. J., Malaisree M., Tosco P., Loeffler H. H., Suruzhon M., Burman M., Bariami S., Bosisio S., Calabro G., Clark F., Mey A. S. J. S., Michel J.. Sire: An interoperability engine for prototyping algorithms and exchanging information between molecular simulation programs. J. Chem. Phys. 2024;160:202503. doi: 10.1063/5.0200458. [DOI] [PubMed] [Google Scholar]
- Eastman P., Galvelis R., Peláez R. P., Abreu C. R. A., Farr S. E., Gallicchio E., Gorenko A., Henry M. M., Hu F., Huang J., Krämer A., Michel J., Mitchell J. A., Pande V. S., Rodrigues J. P., Rodriguez-Guerra J., Simmonett A. C., Singh S., Swails J., Turner P., Wang Y., Zhang I., Chodera J. D., De Fabritiis G., Markland T. E.. OpenMM 8: molecular dynamics simulation with machine learning potentials. J. Phys. Chem. B. 2023;128:109–116. doi: 10.1021/acs.jpcb.3c06662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Case D. A., Aktulga H. M., Belfon K., Cerutti D. S., Cisneros G. A., Cruzeiro V. W. D., Forouzesh N., Giese T. J., Götz A. W., Gohlke H., Izadi S., Kasavajhala K., Kaymak M. C., King E., Kurtzman T., Lee T. S., Li P., Liu J., Luchko T., Luo R., Manathunga M., Machado M. R., Nguyen H. M., O’Hearn K. A., Onufriev A. V., Pan F., Pantano S., Qi R., Rahnamoun A., Risheh A., Schott-Verdugo S., Shajan A., Swails J., Wang J., Wei H., Wu X., Wu Y., Zhang S., Zhao S., Zhu Q., Cheatham T. E., Roe D. R., Roitberg A., Simmerling C., York D. M., Nagan M. C., Merz K. M.. AmberTools. J. Chem. Inf. Model. 2023;63:6183–6191. doi: 10.1021/acs.jcim.3c01153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mark P., Nilsson L.. Structure and Dynamics of the TIP3P, SPC, and SPC/E Water Models at 298 K. J. Phys. Chem. A. 2001;105:9954–9960. doi: 10.1021/jp003020w. [DOI] [Google Scholar]
- Cieplak P., Caldwell J., Kollman P.. Molecular mechanical models for organic and biological systems going beyond the atom centered two body additive approximation: aqueous solution free energies of methanol and N-methyl acetamide, nucleic acid base, and amide hydrogen bonding and chloroform/water partition coefficients of the nucleic acid bases. J. Computa. Chem. 2001;22:1048–1057. doi: 10.1002/jcc.1065. [DOI] [Google Scholar]
- Åqvist J., Wennerström P., Nervall M., Bjelic S., Brandsdal B. O.. Molecular dynamics simulations of water and biomolecules with a Monte Carlo constant pressure algorithm. Chem. Phys. Lett. 2004;384:288–294. doi: 10.1016/j.cplett.2003.12.039. [DOI] [Google Scholar]
- Darden T., York D., Pedersen L.. Particle mesh Ewald: An N·log(N) method for Ewald sums in large systems. J. Chem. Phys. 1993;98:10089–10092. doi: 10.1063/1.464397. [DOI] [Google Scholar]
- Fox T., Kollman P. A.. Application of the RESP methodology in the parametrization of organic solvents. J. Phys. Chem. B. 1998;102:8070–8079. doi: 10.1021/jp9717655. [DOI] [Google Scholar]
- Cancès E., Mennucci B., Tomasi J.. A new integral equation formalism for the polarizable continuum model: Theoretical background and applications to isotropic and anisotropic dielectrics. J. Chem. Phys. 1997;107:3032–3041. doi: 10.1063/1.474659. [DOI] [Google Scholar]
- Szczesniak M., Nowak M., Rostkowska H., Szczepaniak K., Person W. B., Shugar D.. Matrix isolation studies of nucleic acid constituents. 1. Infrared spectra of uracil monomers. J. Am. Chem. Soc. 1983;105:5969–5976. doi: 10.1021/ja00357a002. [DOI] [Google Scholar]
- Aamouche A., Ghomi M., Coulombeau C., Jobic H., Grajcar L., Baron M., Baumruk V., Turpin P., Henriet C., Berthier G.. Neutron inelastic scattering, optical spectroscopies and scaled quantum mechanical force fields for analyzing the vibrational dynamics of pyrimidine nucleic acid bases. 1. Uracil. J. Phys. Chem. 1996;100:5224–5234. doi: 10.1021/jp952485x. [DOI] [Google Scholar]
- Xu R., Yang Q., Bloino J., Biczysko M.. Reliable Modeling of Anharmonic Spectra Line-Shapes from VPT2 and Hybrid QM Models: IR Spectrum of Uracil as a Test Case. J. Phys. Chem. A. 2025;129:5860. doi: 10.1021/acs.jpca.5c02226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ivanov A. Y., Plokhotnichenko A. M., Radchenko E. D., Sheina G. G., Blagoi Y. P.. FTIR spectroscopy of uracil derivatives isolated in Kr, Ar and Ne matrices: matrix effect and Fermi resonance. J. Mol. Struct. Theochem. 1995;372:91–100. doi: 10.1016/S0166-1280(05)80001-6. [DOI] [Google Scholar]
- Barnes A., Stuckey M., Le Gall L.. Nucleic acid bases studied by matrix isolation vibrational spectroscopy: uracil and deuterated uracils. Spectrochim. Acta A Mol. Spect. 1984;40:419–431. doi: 10.1016/0584-8539(84)80073-2. [DOI] [Google Scholar]
- Ataka S., Takeuchi H., Tasumi M.. Infrared studies of the less stable cis form of N-methylformmaide and N-methylacetamide in low-temperature nitrogen matrices and vibrational analyses of the trans and cis forms of these molecules. J. Mol. Struct. 1984;113:147–160. doi: 10.1016/0022-2860(84)80140-4. [DOI] [Google Scholar]
- Song S., Asher S. A., Krimm S., Bandekar J.. Assignment of a new conformation-sensitive UV resonance Raman band in peptides and proteins. J. Am. Chem. Soc. 1988;110:8547–8548. doi: 10.1021/ja00233a042. [DOI] [Google Scholar]
- DeCamp M., DeFlores L., McCracken J., Tokmakoff A., Kwac K., Cho M.. Amide I vibrational dynamics of N-methylacetamide in polar solvents: The role of electrostatic interactions. J. Phys. Chem. B. 2005;109:11016–11026. doi: 10.1021/jp050257p. [DOI] [PubMed] [Google Scholar]
- Hou Y. F., Zhang L., Zhang Q., Ge F., Dral P. O.. Physics-informed active learning for accelerating quantum chemical simulations. J. Chem. Theory Comput. 2024;20:7744–7754. doi: 10.1021/acs.jctc.4c00821. [DOI] [PubMed] [Google Scholar]
- Zhang H., Juraskova V., Duarte F.. Modelling chemical processes in explicit solvents with machine learning potentials. Nat. Commun. 2024;15:6114. doi: 10.1038/s41467-024-50418-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gaigeot M.-P.. Unravelling the conformational dynamics of the aqueous alanine dipeptide with first-principle molecular dynamics. J. Phys. Chem. B. 2009;113:10059–10062. doi: 10.1021/jp903745r. [DOI] [PubMed] [Google Scholar]
- Lee M.-E., Lee S. Y., Joo S.-W., Cho K.-H.. Amide I bands of terminally blocked alanine in solutions investigated by infrared spectroscopy and density functional theory calculation: Hydrogen-bonding interactions and solvent effects. J. Phys. Chem. B. 2009;113:6894–6897. doi: 10.1021/jp810153w. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data and scripts useful for this work are available in a Zenodo repository at 10.5281/zenodo.18391996. In particular, we have provided the data sets with both levels of theory, the Python scripts for generating new geometries (including normal-mode displacements and artificial environment configurations), the trained models for the three studied molecules, and the dipole moment data used for the IR spectra computations. Software: ML-server: https://github.com/Molecolab-Pisa/ML-server; GPX: https://github.com/Molecolab-Pisa/GPX (permut_symm branch); Moldex: https://github.com/Molecolab-Pisa/moldex.

