A universal augmentation framework for long-range electrostatics in machine learning interatomic potentials

Dongjin Kim; Xiaoyu Wang; Santiago Vargas; Peichen Zhong; Daniel S King; Theo Jaffrelot Inizan; Bingqing Cheng

doi:10.1021/acs.jctc.5c01400

. Author manuscript; available in PMC: 2025 Dec 17.

Published in final edited form as: J Chem Theory Comput. 2025 Dec 10;21(24):12709–12724. doi: 10.1021/acs.jctc.5c01400

A universal augmentation framework for long-range electrostatics in machine learning interatomic potentials

Dongjin Kim ¹, Xiaoyu Wang ¹, Santiago Vargas ², Peichen Zhong ³, Daniel S King ³, Theo Jaffrelot Inizan ³, Bingqing Cheng ^1,^2,^3,^4,^5,^*

PMCID: PMC12707604 NIHMSID: NIHMS2128398 PMID: 41368735

Abstract

Most current machine learning interatomic potentials (MLIPs) rely on short-range approximations, without explicit treatment of long-range electrostatics. To address this, we recently developed the Latent Ewald Summation (LES) method, which infers electrostatic interactions, polarization, and Born effective charges (BECs), just by learning from energy and force training data. Here, we present LES as a standalone library, compatible with any short-range MLIP, and demonstrate its integration with methods such as MACE, NequIP, Allegro, CACE, CHGNet, and UMA. We benchmark LES-enhanced models on distinct systems, including bulk water, polar dipeptides, and gold dimer adsorption on defective substrates, and show that LES not only captures correct electrostatics but also improves accuracy. Additionally, we scale LES to large and chemically diverse data by training MACELES-OFF on the SPICE set containing molecules and clusters, making a universal MLIP with electrostatics for organic systems, including biomolecules. MACELES-OFF is more accurate than its short-range counterpart (MACE-OFF) trained on the same dataset, predicts dipoles and BECs reliably, and has better descriptions of bulk liquids. By enabling efficient long-range electrostatics without directly training on electrical properties, LES paves the way for electrostatic foundation MLIPs.

I. INTRODUCTION

The short-range approximation underlies most established machine learning interatomic potentials (MLIPs) [1, 2], as it enables the decomposition of total energy into individual contributions from local atomic environments, allowing efficient learning and inference. However, it is increasingly recognized that explicitly modeling long-range interactions is essential for systems with significant electrostatics and dielectric response, such as electrochemical interfaces [3], charged molecules [4, 5], and ionic [6] and polar materials [7].

While various approaches have been developed to incorporate long-range interactions into MLIPs [4, 5, 7–18], many require specialized training labels beyond energy and forces: the fourth-generation high-dimensional neural network potentials (4G-HDNNPs) [9], the charge constraint ACE model [19], and BAMBOO [12] are trained to reproduce atomic partial charges from reference quantum mechanical calculations; the self-consistent field neural network (SCFNN) [10] and the deep potential long-range (DPLR) model [6] learn from the positions of maximally localized Wannier centers (MLWCs) for insulating systems; and PhysNet [8], AIMNET2 [20], and SO3LR [21] utilize the dipole moments of gas-phase molecules. These additional data requirements limit the applicability of such methods to standard datasets containing only atomic positions, energies, forces, and sometimes stresses.

By contrast, a few methods can learn long-range effects directly from standard datasets [4, 5, 7, 14, 15, 22]. These include message-passing schemes such as Ewald message passing [15] and RANGE [18], and Coulombinspired long-range message-passing model FeNNix [23], and descriptor-based approaches like LODE [4, 5, 24] and the density-based long-range descriptor [22]. However, in these methods the long-range contributions are not explicitly related to electrostatics due to charged atoms, making it difficult to extract electrical response properties.

The Latent Ewald Summation (LES) framework [25–27] overcomes these limitations by inferring atomic charges and the resulting long-range electrostatics directly from total energy and force data, without training on explicit charge labels. LES can thus be trained on any dataset used for short-ranged MLIPs. The inference of atomic charges not only enhances physical interpretability, but also enables the extraction of polarization ( $P$ ) and Born effective charges (BECs) [27]. These quantities allow computation of electrical response properties, including dielectric constants, ionic conductivities, ferroelectricity, and infrared (IR) spectra, and also enable MLIP-based molecular dynamics under applied electric fields.

Here, we introduce LES as a universal electrostatics augmentation framework for short-ranged MLIPs. Unlike prior methods that are tightly integrated with specific architectures, LES is implemented as a standalone library that can be combined with a wide range of MLIP models [28–41]. We provide open-source patches for several MLIP packages (MACE [33], NequIP [32], CACE [34], MatGL [36], Allegro [42], and UMA [43]), enabling seamless integration. We benchmark and compare LES-augmented models across diverse systems with the baseline short-ranged MLIPs, demonstrating improved accuracy and correct long-range electrostatics through the inference of BECs.

Additionally, to demonstrate scalability to large, chemically diverse datasets, we train MACELES-OFF, an MLIP with explicit long-range electrostatics for organic molecules, using the SPICE dataset [44]. We find that MACELES-OFF outperforms its short-range counterpart (MACE-OFF [45]) and accurately predicts BECs, despite never being trained on charge or polarization data.

II. ARCHITECTURE

The integration of the LES library with a host MLIP is illustrated in Fig. 1. The black box outlines the standard workflow of a short-range MLIP: Given a molecular or periodic system, the base MLIP computes invariant and/or equivariant features for each atom $i$ , based on either local atomic environment descriptors [28, 31, 46] or message-passing architectures [47]. The local invariant descriptors $B_{i}$ are mapped via a neural network to atomic energies $E_{i}$ , which are then summed to obtain the total short-range energy $E^{sr} = \sum_{i = 1}^{N} E_{i}$ .

Figure 1. — Schematic illustration of the LES integration with a short-ranged MLIP. The black box shows a standard MLIP workflow, where invariant or equivariant atomic features are computed, the invariant features $B_{i}$ are mapped to atomic energies $E_{i}$ and summed to yield the short-range energy $E^{sr}$ . The red box shows the LES module, which either predicts latent charges $q_{i}^{les}$ from atomic features $B_{i}$ or receives them from the host MLIP. Based on atomic positions $r_{i}$ and simulation cell size $L$ , LES then computes the long-range energy $E^{lr}$ via Ewald summation or pairwise interactions. Optionally, LES also enables the computation of Born effective charges (BECs) as shown in the shaded red region. The blue arrows indicate auto-differentiation operations that are used to compute forces $F_{i}$ or BECs.

The red box in Fig. 1 shows the LES module that is written in PyTorch, and the black arrows crossing the two boxes indicate communication between LES and the host MLIP. The LES module can either take the $B_{i}$ features to predict the latent charge of $i$ of each atom, $q_{i}^{les}$ , via a neural network, or this charge prediction part can happen inside the MLIP and be passed to LES. These $q^{les}$ charges are then used to compute the long-range energy contribution $E^{lr}$ . For periodic systems, the Ewald summation is used:

E^{lr} = \frac{1}{2 ε_{0} V} \sum_{0 < k < k_{c}} \frac{1}{k^{2}} e^{- σ^{2} k^{2} / 2} | S (k) |^{2},

(1)

with

S (k) = \sum_{i = 1}^{N} q_{i}^{les} e^{i k \cdot r_{i}},

(2)

where $ε_{0}$ is the vacuum permittivity, $k$ is the reciprocal wave vector determined by the periodic cell of the system $L$ , $V$ is the cell volume, and $σ$ is a smearing factor (typically chosen to be $1 Å$ ). For finite systems, the long-range energy is simply computed using the pairwise direct sum:

E^{lr} = \frac{1}{2} \frac{1}{4 π ε_{0}} \sum_{i = 1}^{N} \sum_{j = 1}^{N} [1 - φ (r_{ij})] \frac{q_{i} q_{j}}{r_{ij}},

(3)

where the complementary error function $φ (r) = erfc (\frac{r}{\sqrt{2} σ})$ .

The $E^{lr}$ computed in LES is returned to the host MLIP package to be added to the total potential energy, $E = E^{sr} + E^{lr}$ . Inside the host MLIP, forces and stresses are obtained through automatic differentiation of the total energy with respect to atomic positions and cell strain, respectively. The training procedure remains unchanged from the original MLIP package, using conventional loss functions for energy, forces, and stresses with user-defined weights.

Optionally, as indicated by the red shaded area in Fig. 1, LES can predict the Born effective charge tensors $(Z_{α β}^{*})$ , where $α$ and $β$ indicate Cartesian directions. The theory is described in Ref. [27]. In brief, for a finite system such as a gas-phase molecule, the BEC for atom $i$ is

Z_{i α β}^{*} = \frac{\partial P_{α}}{\partial r_{i β}},

(4)

where $P = \sum_{i = 1}^{N} q_{i} r_{i}$ is the polarization or the dipole moment. For a homogeneous bulk system under periodic boundary conditions (PBCs),

Z_{i α β}^{*} = lim_{k \to 0} R [exp (- ik r_{i α}) \frac{\partial P_{α} (k)}{\partial r_{i β}}],

(5)

with $P_{α} (k) = \sum_{i = 1}^{N} \sqrt{ε_{\infty}} \frac{q_{i}^{les}}{i k} exp (i k r_{i α})$ . In practice, we set $k$ to be the smallest reciprocal lattice vector, which is justified by the convergence tests using replicated cells presented in Methods. The high-frequency (electronic) relative permittivity $ε_{\infty}$ is an extra parameter for the system that can be easily obtained from experimental measurements, such as refractive index, or from density-functional perturbation theory (DFPT) calculations [48] with frozen nuclei.

Overall, the computation inside LES is light, and the communication between LES and the host MLIP is minimized. This design allows the LES augmentation to be implemented as a drop-in module that preserves the architecture, training, and inference workflows of the host MLIP package. As long as the host MLIP is implemented in PyTorch and produces per-atom features used for local energy prediction, LES can be patched in with minimal effort.

Thus far, we have integrated LES into several representative MLIPs with distinct architectures, which we briefly describe here. NequIP [32, 49] is a message passing neural network (MPNN) with E(3)-equivariant convolutions, and it typically uses 3–5 message passing layers with a perceptive field that can be as large as about $20 Å$ . Allegro [42, 49] is a scalable, strictly local many-body E(3)-equivariant deep neural network interatomic potential that circumvents the need for atom-centered message passing. The architecture learns scalar and tensorial features associated with directed edges in the atomic graph, enabling prediction of pairwise energies and subsequent reconstruction of per-atom energies. CACE [34] is based on atomic cluster expansion (ACE) [31]. With one layer ( $n_{l} = 1$ ), CACE is a local atomic descriptor-based model employing polynomial expansions of atomic clusters with body order $ν$ in Cartesian coordinates. When $ν = 2$ , three-body interactions are captured and (C)ACE has the same expressive power as earlier MLIP methods, including HDNNPs [28], Gaussian Approximation Potentials [29], and DeepMD [6]. MACE [33] is a MPNN that combines equivariant representations with the physically grounded structure of atomic cluster expansions. Thanks to the high-body-order messages, typically only one message passing layer on top of the base ACE layer is needed. MatGL is a graph deep learning library for materials modeling [36]. Instead of using equivariant message passing, MatGL utilizes node and line graph representations to incorporate three-body interactions, such as M3GNet [50] and CHGNet [51]. UMA is an equivariant MPNN that was designed for large datasets such as OMol25[52] and OMat24[53]. UMA extends several ideas from the preceding eSEN [37] architecture, including the eSCN [54] convolution, added envelope functions for smoothness, and non-discrete node representations. UMA furthur includes Mixture of Linear Experts (MoLE) [43] and global embedding of charge, spin, and task inputs. These models span a wide range of architectural paradigms–descriptor-based versus message-passing, invariant versus equivariant representations–and demonstrate that LES can be universally incorporated without requiring architecture-specific modifications.

III. BENCHMARKS

Here, we benchmark the performance of the aforementioned MLIP architectures (MACE, NequIP, CACE, CHGNet, and Allegro) with and without LES augmentation across three distinct systems, exemplifying the key challenges and practical importance of long-range electrostatic interactions. We also tested UMA and UMA-LES, but we found the baseline UMA model to perform substantially worse compared to the other baseline models despite exhaustive hyperparameter tuning, presumably because UMA is designed for large models fitted to large datasets and is not tuned for small datasets.. In the bulk water system, while short-range interactions play an important role in the hydrogen-bond network and structural properties [55], long-range electrostatics are crucial for its dielectric response and interfacial behavior [56]. The polar dipeptides represent gas-phase systems with strong multipole interactions [44]. The ${Au}_{2} -MgO (001)$ system involves the adsorption of metallic species on oxide substrates, where long-range interactions critically affect adsorption energy and charge redistribution, particularly upon substrate doping [9, 26]. In addition to standard metrics such as energy and force root mean square errors (RMSEs), we also evaluate the prediction of physical observables (BECs and adsorption energies) not in the training set.

A. Water

We trained baseline and LES-augmented MLIPs using the energy and forces from the RPBE-D3 bulk water dataset [57], which contains 604 train and 50 test configurations, each of 64 water molecules. The configurations were generated from MD trajectories at temperatures ranging from 270 K to 420 K at water density under room temperature ( $\approx 997 kg / m^{3}$ ), using an on-the-fly learning scheme. For testing LES, we also used the BEC values for an additional 100 snapshots of bulk water at experimental density and room temperature [57], computed also from the RPBE-D3 DFT. We assumed the experimental high-frequency permittivity ( $ϵ_{\infty} = 1.78$ ) of water in the prediction of BECs based on Eqn. (5).

Fig. 2c shows force RMSEs for each MLIP architecture, as the energy RMSE values remain uniformly low (0.1–0.3 meV/atom) except for CHGNet (0.8–0.9 meV/atom) and are thus omitted here. For the baseline MLIPs as indicated by the hollow bars in Fig. 2c, we observe the usual trend typically expected from short-ranged models: Without message-passing layers ( $n_{l} = 1$ ), increasing the cutoffs ( $r$ ) improves the accuracy (e.g., comparing CACE $ν = 2 r = 4.5 Å$ and $r = 5.5 Å$ models), and so does boosting the body order (CACE $r = 4.5 Å ν = 2$ and $ν = 3$ ). More layers ( $n_{l}$ ) effectively increase the perceptive field of the short-ranged MLIPs and reduce the errors, for both equivariant MPNNs (MACE and NequIP) and the invariant CHGNet. Increasing the rotation order ( $ℓ$ ) in E(3) equivariant representations of messages also helps the accuracy, as seen in MACE, NequIP and Allegro models.

Figure 2. — Benchmark short-ranged and LES-augmented MLIPs for bulk water. a: A representative snapshot of bulk water. b: A parity plot comparing Born effective charge (BEC) tensors $(Z^{*})$ for 100 bulk water from LES-augmented NequIP ( $r = 4.5 Å$ model, $n_{l} = 3, ℓ = 2$ ) against RPBE-D3 DFT (Ref. [57]). The main panel compares the diagonal elements of BEC ( $Z_{α α}^{*}$ ), and the inset shows the off-diagonal elements $(Z_{α β}^{*}$ with $α \neq β)$ . c: The force root mean square errors (RMSEs) for baseline MLIPs (hollow bars), and with LES augmentation (solid bars). d: The RMSEs for BEC predictions using LES and corresponding $R^{2}$ coefficients. The cutoff $r$ , the number of layers $n_{l}$ , the order of irreducible representations (rotation order) $ℓ$ in message passing, and body order $ν$ are indicated for each MLIP, as applicable.

The solid bars in Fig. 2c show the force RMSEs for the LES-augmented models. Comparing the solid bars with the hollow ones, LES reduces the errors for all models, while the extent of improvement is somewhat dependent on the architecture of the short-ranged baseline. For this bulk water dataset, LES benefits most of the MLIPs with a smaller effective perceptive field. That is, the MACE and the CACE models without message-passing ( $n_{l} = 1$ ) see the largest force error reductions, and the NequIP and CHGNet with several message-passing layers have modest reductions. Similarly, the non-message-passing architecture Allegro also shows a meaningfully large reduction, consistent with this trend, while its number of layers ( $n_{l}$ ) corresponds to the body order (e.g., $n_{l} = 3$ represents five-body tensor features) rather than message-passing depth. The other hyperparameters of the base MLIPs, $ν$ and $ℓ$ , do not seem to have a strong influence on the efficacy of LES.

Moreover, we ask the question: Is LES able to capture the correct electrostatic interactions and predict physical atomic charges? This is answered via the BEC benchmark shown in Fig. 2d, which presents the RMSEs of the BECs compared to the DFT calculated values. Importantly, for all MLIPs, regardless of model architecture, hyperparameters, or effective cutoffs, LES can accurately predict the BECs. Fig. 2b shows the parity plot for one of the models, which illustrates the quality of the agreement with the DFT ground truth. We do not find any obvious correlation between the accuracy of the BEC prediction and the baseline MLIP architecture, nor with the force RMSE errors. However, there is a weak inverse correlation with the reduction in force RMSE errors (see Fig. 11 in Methods). This suggests that LES is a robust approach to infer BECs, regardless of the underlying model performance. The BECs are not only sensitive indicators of the correct electrostatics, but are also useful in atomistic simulations for electrical response properties. For instance, in Ref. [27], we showed that such predictions of BECs can enable accurate prediction of water IR spectra under zero or finite external electric fields.

Figure 11. — Correlation analysis between force and Born effective charge (BEC) RMSEs for LES-augmented MLIPs. **a, b:** Correlation between force RMSE and BEC RMSE for water (a) and dipeptide (b) systems. **c, d:** Correlation between force RMSE reduction after LES augmentation and BEC RMSE for water (c) and dipeptide (d) systems.

B. Dipeptides

We trained baseline and LES-augmented MLIPs using energy and forces from a dataset of polar dipeptide conformers selected from SPICE [26, 44], which contains 550 configurations (90% train/10% test split) For testing LES, we computed BEC values at the $ω B 97 -mD 3 (BJ) / Def 2 SVP$ level of theory. As these systems are in a vacuum, the high-frequency permittivity is $ϵ_{\infty} = 1$ , and one can use Eqn. (4) directly when predicting BECs using LES.

Figs. 3c and d show the energy and force RMSE values for each MLIP architecture. For the baseline MLIPs indicated by the hollow bars, we again observe the usual trends that increasing body order (CACE $r = 4.5 Å ν = 2$ and $ν = 3$ ), and the number of layers (CHGNet $r = 4.5 Å n_{l} = 3$ and $n_{l} = 4$ ) lead to improved accuracy. Additionally, increasing $ℓ$ improves MACE, NequIP and Allegro models.

Figure 3. — Benchmark short-ranged and LES-augmented MLIPs for dipeptide. a: A representative snapshot of a dipeptide conformer composed of arginine and aspartic acid from the SPICE dataset [44]. b: A parity plot comparing Born effective charge (BEC) tensors for 55 dipeptide configurations [44] from LES-augmented NequIP ( $r = 4.5 Å, n_{l} = 3, ℓ = 2$ ) model against DFT. The main panel compares the diagonal elements of BEC ( $Z_{α α}^{*}$ ), and the inset shows the off-diagonal elements ( $Z_{α β}^{*}$ with $α \neq β)$ c, d: The energy (c) and force (d) root mean square errors (RMSEs) for baseline MLIPs (hollow bars), and with LES augmentation (solid bars). e: The RMSEs for BEC predictions using LES and corresponding $R^{2}$ coefficients. The cutoff $r$ , the number of layers $n_{l}$ , the order of irreducible representations (rotation order) $ℓ$ of messages, and body order $ν$ are indicated for each MLIP, as applicable.

The solid bars in Figs. 3c and d represent the energy and force RMSEs for the LES-augmented models. Comparing the solid and hollow bars, we find again that LES augmentation consistently reduces both energy and force prediction errors across all baseline MLIPs, regardless of their specifics: hyperparameters such as $r$ and $ν$ do not appear to strongly influence the effectiveness of LES, nor do the perceptive fields of baseline models. This implies that LES provides an improved molecular representation on top of message passing.

As is seen in Fig. 3e, LES-augmented models all show excellent agreement with DFT reference values for BEC predictions (up to $R^{2} = 0.97$ ). As an illustration, Fig. 3b shows a parity plot. We also investigated whether the quality of the BEC predictions is correlated with force RMSE values or RMSE reduction, as shown in Figs. 11b and d. As is seen, there are no clear-cut correlations between BEC and force or energy RMSE improvement, showing that LES remains a robust way to compute charge regardless of the relative performance of the baseline model. As shown in our previous study, the ability of LES to capture electrostatic interactions also extends to dipole moments, quadrupole moments, and IR spectra [26]. Electrostatics strongly affect peptide backbone conformations in dipeptides [58], which suggests LES augmentation can be helpful for protein structure, folding, binding, and other biological functions.

C. ${Au}_{2}$ on $MgO (001)$

We trained baseline and LES-augmented MLIPs using energy and forces from an ${Au}_{2} -MgO (001)$ dataset from Ko et al. [9], consisting of 5000 configurations (90% train/10% test split). The configurations consist of a gold dimer adsorbed on an MgO(001) surface with two adsorption geometries: an upright non-wetting and a parallel wetting configuration. In some configurations, three Al dopant atoms were introduced in the fifth subsurface layer (Fig. 4a). Despite their distance of over $10 Å$ away from the gold dimer, the dopant atoms significantly influence the electronic structure and the energetics between the wetting and non-wetting configurations.

Figure 4. — Benchmark short-ranged and LES-augmented MLIPs for ${Au}_{2}$ on $MgO (001)$ . a: Representative configurations for wetting (top) or non-wetting (bottom) ${Au}_{2}$ adsorbed on doped (top) or undoped (bottom) MgO(001) surface. **b, c:** The energy (b) and force (c) root mean square errors (RMSEs) for baseline MLIPs (hollow bars), and with LES augmentation (solid bars). d: The energy difference ( $E_{wetting} - E_{non–wetting}$ ) in eV for doped (top) and undoped (bottom) substrates. Baseline (hollow bars) and LES-augmented models (solid bars) compared with reference DFT results (gray bars and horizontal dashed lines). The cutoff $r$ , the number of layers $n_{l}$ , the order of irreducible representations (rotation order) $ℓ$ of messages, and body order $ν$ are indicated for each MLIP, as applicable.

Figs. 4b and c present the energy and force RMSEs for each MLIP architecture. For all baseline MLIP models indicated by the hollow bars, increasing the perceptive field by either raising the cutoff radius or increasing the number of layers leads to significant improvements in both energy and force accuracy, for both the equivariant MPNNs (MACE, NequIP) and the invariant CHGNet. When the perceptive field is smaller than the distance between the dopants and the adsorbates, the MLIPs perform poorly.

The solid bars in Figs. 4b and c demonstrate that the LES augmentation rather consistently improves energy and force predictions across all models, with particularly large gains for models with a smaller perceptive field. For example, the force RMSE reductions for MACE $r = 5.5 Å n_{l} = 1$ , NequIP $r = 4 Å n_{l} = 2 ℓ = 1$ , CHGNet $r = 4.5 Å n_{l} = 1$ , CACE $r = 5.5 Å n_{l} = 1 ν = 3$ , CACE $r = 5.5 Å n_{l} = 1 ν = 2$ , and Allegro $r = 5.5 Å n_{l} = 2 ℓ = 1$ are about 80%.

Fig. 4d shows the prediction of a physical observable, the energy difference ( $E_{wetting} - E_{non–wetting}$ ) between wetting and non-wetting configurations for doped and undoped substrates. This is computed by performing geometry optimizations of the positions of the gold atoms, with the substrate fixed, for both doped and undoped surfaces. For the undoped MgO substrate, the non-wetting configuration is more stable, while the Al-doping stabilizes the wetting configuration. As shown, the baseline MLIPs with small perceptive fields of less than about $9 Å$ struggle to distinguish between undoped and Aldoped structures. Either increasing the perceptive field or adding LES significantly enhances the accuracy of predicting this energy difference, closely matching DFT reference values of 934.8 meV for undoped and −66.9 meV for doped surfaces. Thus, although LES may not further improve the prediction for already highly accurate models, it assures that long-range effects associated with doping and adsorption are captured in all cases.

Notably, the CACE model with three-body interactions ( $r = 5.5 Å n_{l} = 1 ν = 2$ ) has the same body order and expressiveness as the HDNNPs [28]. Ko et al. [9] showed that the SR HDNNP fails to distinguish the energetics of the doped and undoped systems, while the 4G-HDNNP, which was trained on DFT partial charges and uses a charge equilibration scheme, can reproduce the physical behavior. Our observation is consistent with theirs, except that no training on DFT charges nor charge equilibration is needed.

IV. A LONG-RANGE TRANSFERABLE MLIP FOR ORGANICS

In the above, we benchmarked LES on specific systems and relatively small datasets. Here, we explore whether the long-range correction scheme works at scale, for large and chemically diverse datasets. This is not only relevant for validating the scaling capability of the method, but is practically important for constructing emerging “foundation models” [59] that work across the periodic table.

The MACE-OFF23 (small (S), medium (M), and large (L)) [45] is a recent set of short-range transferable machine learning force fields for organic molecules, which are based on the MACE architecture and trained on finite organic systems: small molecules of up to 50 atoms from about 85% of the SPICE dataset version 1 [44] with a neutral formal charge and containing ten chemical elements H, C, N, O, F, P, S, Cl, Br, and I, as well as 50–90 atom molecules randomly selected from the QMugs dataset [60]. The dataset contains the energies and forces computed at the $ω B 97 M-D 3 (BJ) / def 2 -TZVPPD$ DFT level of accuracy. We trained the MACELES-OFF model based on the same training/validation split of MACE-OFF23.

We note that our main purpose here is to showcase the effect of the LES augmentation compared to the baseline model, rather than fitting the most accurate model possible. The MACELES-OFF model we trained used hyperparameters of cutoff radius $r = 4.5 Å$ , chemical channels $k = 192$ , the order of the equivariant features $ℓ = 1$ , and float32 precision. The comparison of these hyperparameters with the original MACE-OFF23 models is presented in Table II. Compared to the MACE-OFF23(S), the MACELES-OFF has the same $r$ but higher $k$ and $ℓ$ . Thanks to its lower precision as well as little overhead from LES, the MACELES-OFF is about as fast as the MACE-OFF23(S), and much faster than the MACE-OFF23(M) and MACE-OFF23(L) models, as shown in the timing benchmark in Fig. 13 of Methods.

Table II.

Hyperparameters of the MACE-OFF and MACELES-OFF models. All MACE-OFF models (23(S), 23(M), 23(L), and 24(M)) are taken from Ref. [45]. 24(M) was trained on the updated SPICE version 2 [77].

	23(S)	23(M)	23(M)b	23(L)	24(M)	LES
Cutoff $r (Å)$	4.5	5.0	6.0	5.0	6.0	4.5
Channels $k$	96	128	128	192	128	192
Irrep order $ℓ$	0	1	1	2	1	1
SPICE version	1	1	1	1	2	1
Precision	float64	float64	float64	float64	float64	float32

Open in a new tab

Figure 13. — Computational performance benchmarks of molecular dynamics (MD) simulations using the MACELES-OFF and the MACE-OFF23 [45] models. The timing of MD simulations for bulk liquid water with varying system sizes was performed on an NVIDIA L40S GPU. Benchmarks were performed with both single-precision (float32) and double-precision (float64) models using ASE and OpenMM implementations.

In Fig. 5, we compare the RMSE values for energy and force for the three original MACE-OFF23 (S, M, L) and MACELES-OFF, on different subsets of test configurations (representative snapshots in Fig. 6a). Overall, the MACELES-OFF shows RMSEs comparable to the MACE-OFF23(L) that uses a larger cutoff $r = 5 Å$ and higher $ℓ = 2$ (see Table II). Compared to the MACE-OFF23(S) model that shares the same cutoff, the errors are about halved. Amongst all the subsets, the MACELES-OFF shows particularly better accuracy for water clusters, solvated amino acids, and dipeptides, which may indicate that the model is well-suited for simulating biological systems.

Figure 5. — Comparison of test set root mean square errors (RMSEs) for energy and forces of the MACE-OFF23 (S, M, L) and MACELES-OFF models for organic molecules, with the underlying DFT reference values [45]. Exact values are provided in Table III.

Figure 6. — Assessment of dipole moments and Born effective charges (BECs) predicted by the MACELES-OFF for subsets of the MACE-OFF test set. a: Representative configurations from each subset of the MACE-OFF test set. b: Parity plots comparing predicted dipole moment components ( $μ$ ) against reference DFT values. c: Parity plots comparing predicted BEC tensors against DFT reference. Data points are colored by atomic number, with diagonal BEC components ( $Z_{α α}^{*}$ ) shown in the main plots and off-diagonal $(Z_{α β}^{*})$ components shown in inset plots.

To test whether LES is able to extract the correct electrostatics, we investigate if the model is able to infer the dipole moments $μ$ (same as the polarization of finite systems) and BECs. For about 50 configurations in each of the test subsets, we calculated reference BECs and dipole moments at the $ω B 97 m-D 3 (BJ) / Def 2 SVP$ level of theory (calculation details in Methods). Fig. 6b shows good agreement between the reference and the predicted dipole moments $μ$ . Fig. 6c compares the DFT and the BECs computed by LES via taking the derivative of the predicted $μ$ as in Eqn. (4), showing broad agreement in both the on- and off-diagonal components over the broad range of elements and chemical species contained in the test set. These agreements thus demonstrate that accurate inference of dipole moments and BECs is achievable in foundation-like models trained only on energy and forces.

a. Water

A key requirement for organic force fields is the accurate description of bulk water. Note that the training set only contains small, non-periodic water clusters; the application to the simulations of bulk liquid is in itself a generalization task.

We performed equilibrium NVT molecular dynamics simulations of bulk water at 300 K and experimental density, with further details provided in the Methods section. Fig. 7a demonstrates the capability of the MACELES-OFF model to accurately capture the radial distribution function (RDF) of water, and shows a result comparable to those from the previous MACE-OFF23 models (S, M) and experimental measurements [61].

Figure 7. — Bulk water properties predicted by the MACELES-OFF compared to the MACE-OFF23 and experimental data. a: Oxygen-Oxygen radial distribution functions in bulk water from the MACE-OFF23 models (S, M) and the MACELES-OFF model compared with experimental value (Ref. [61]). b: The infrared (IR) absorption spectra of bulk liquid water computed using the MACELES-OFF (red). For comparison, we also show the experimental IR spectrum [62] (gray shading), the classical MD result from MACE-OFF23(S) (solid blue), and the result with quantum nuclear corrections to MACE-OFF23(S) (dashed blue), as reported in Ref. [45]. c: Density isobar of liquid water at 1 atm. Both the MACELES-OFF (red circles) and the MACE-OFF23(M) (green squares) models show similar isobaric density characteristics for liquid water at 1 atm, with experimental data [65] included for reference.

We computed an IR spectrum from the MD trajectory using predicted BECs, as illustrated in Fig. 7b, with additional details given in the Methods section. For comparison, we also included the IR spectrum predicted using MACE-OFF23(S), computed from the autocorrelation function of the time derivative of the total dipole moment predicted using a separate MACE-OFF23- $μ$ model along classical MD trajectories [45], as well as experimental results [62]. The intramolecular bending mode ( $\approx 1640 {cm}^{- 1}$ ), the intermolecular low-frequency libration band $(\approx 650 {cm}^{- 1})$ , and the hydrogen-bond translational stretching mode ( $\approx 200 {cm}^{- 1}$ ) are all well-captured by the MACELES-OFF, whereas the MACE-OFF23(S) classical MD result reproduces the band shapes but not the intensities. The predicted intramolecular vibrational mode (OH stretching band $\approx 3400 {cm}^{- 1}$ ) is blue-shifted relative to the experiment, which is due to our use of classical MD without nuclear quantum effects (NQEs) [63]. To illustrate the impact of NQEs, we additionally show the IR spectrum obtained from the MACE-OFF23(S) with path-integral MD (PIMD) simulations within the PIGS approximation [64], as reported in Ref. [45]. Incorporating NQEs brings the MACE-OFF23(S) frequencies into close agreement with experiment, especially by correcting the blue shift in the OH stretching region.

We simulated the temperature dependence of liquid water density based on NPT simulations conducted at 1 atm, as shown in Fig. 7c. Comparative analysis demonstrates that MACELES-OFF achieves superior density prediction, exhibiting about 10% overestimation of liquid water density compared to the original 18% overestimation reported by the MACE-OFF23(M) model [45]. This improvement highlights the importance of long-range dipole-dipole interactions in predicting bulk properties such as density. We also note that the MACE-OFF model can predict water density closer to experimental values by further increasing the cutoff to $6 Å$ , which corresponds to a perceptive field of $12 Å$ .

b. Molecular liquids

We investigated the performance of MACELES-OFF in predicting condensed-phase properties such as densities ( $ρ_{liq}$ ) and enthalpy of vaporization ( $Δ_{vap} H$ ) under ambient conditions (298 K and 1 atm), as presented in Fig. 8. Reproducing such properties was observed to present challenges for short-ranged MLIPs [45, 67]. We selected a comprehensive test set of 39 molecular liquids with boiling points greater than 320 K, referencing the MACEOFF benchmark [45]. This set encompasses diverse chemical classes highly relevant to chemistry and biology, including alcohols, amines, amides, ethers, aromatics/hydrocarbons, ketones/aldehydes, thiols/sulfides, and other functionalized compounds.

Figure 8. — Predicted densities and heats of vaporization for diverse molecular liquids under ambient conditions (298 K and 1 atm) using the MACELES-OFF. a: Density ( $ρ_{liq}$ ) predictions from the MACELES-OFF (red circles) and the MACE-OFF23(M) (green squares), compared with experimental values [66]. b: Heats of vaporization ( $Δ_{vap} H$ ) predictions from the MACELES-OFF (red circles) and the MACE-OFF23(M) (green squares), compared with experimental values [66].

As shown in Fig. 8, the MACELES-OFF model demonstrates superior accuracy in predicting liquid density and enthalpy of vaporization compared to the MACE-OFF23(M). For liquid density, the MACELES-OFF achieves an MAE of $0.04 g / {cm}^{3}$ and RMSE of $0.05 g / {cm}^{3}$ , which represents approximately half the error values of the MACE-OFF23(M) (MAE: $0.09 g / {cm}^{3}$ , RMSE: $0.15 g / {cm}^{3}$ ). These low error values, coupled with a stronger correlation with experimental data ( $R = 0.96$ ) compared to the MACE-OFF23(M) model ( $R = 0.89$ ), indicate that the MACELES-OFF accurately learned both short- and long-range intermolecular interactions.

Similarly, regarding the calculations of $Δ_{vap} H$ , the MACELES-OFF model yields a MAE of 1.32 kcal/mol and RMSE of 1.62 kcal/mol with a correlation of $R = 0.84$ and thus outperforms the MACE-OFF23(M) model that has an MAE (2.18 kcal/mol) and RMSE (2.53 kcal/mol) with $R = 0.87$ . Moreover, while the MACE-OFF23(M) model exhibits a systematic $Δ_{vap} H$ offset of about 2 kcal/mol, which is largely eliminated in the MACELES-OFF prediction.

c. Simulations of biological molecules

To probe the influence of the long-range interactions in MACELES-OFF on protein-protein and protein-water interactions, we studied two biological systems: solvated alanine dipeptide and the 1FSV protein [68].

For the alanine dipeptide in water, we computed the free energy surface (FES) using well-tempered metadynamics [69] simulations in explicit solvent (see Methods Section), as shown in Fig. 9. The AMBER ff99SB-ILDN force field exhibits an FES with four local minima, consistent with previous studies [23, 70]: antiparallel $β$ -sheet ( $a β$ ), polyproline II (PPII), right-handed $α$ -helix ( $α -H (R)$ ), and left-handed $α$ -helix ( $α -H (L)$ ). In comparison, MACELES-OFF displays the deepest minimum at $α -H (R)$ . The free energy difference between $α -H (R)$ and the second minimum ( $a β$ ) is approximately 1.5 kcal/mol, while the difference with PPII is around 3.0 kcal/mol, consistent with the relative ordering of minima and similar to the relative free energies reported for MP2/ccpVTZ level with implicit solvent [71]. Compared to the recent FENNIX-Bio1 model [23] that includes long-range interactions, nuclear quantum effects, and is trained on the high-level SPICE2(+)-ccECP dataset, MACELES-OFF yields a very similar overall shape of the FES. Interestingly, the right-handed $α$ -helix region ( $α_{D}$ in Fig. 9b), does not appear as a clear minimum but rather as a shallow valley on the FES by both AMBER ff99SB-ILDN or AMBER99 [23], likely due to insufficient treatment of water-protein interactions, an issue well tackled by MACELES-OFF.

Figure 9. — Torsional free energy surfaces (FES) of solvated alanine dipeptide computed at 310 K using different potentials. a: FES based on AMBER ff99SB/TIP3P. b: FES based on MACELES-OFF. Four local minima are distinguished in the landscape and they correspond to distinct conformations defined by the central $ϕ$ and $ψ$ backbone angles: antiparallel $β$ -sheet ( $a β$ ), polyproline II (PPII), right-handed $α$ -helix ( $α -H (R)$ ), and left-handed $α$ -helix structures ( $α -H (L))$ ). In addition, compared to AMBER ff99SB/TIP3P, MACELES-OFF identifies another local minimum that corresponds to the right-handed $α$ -helix region ( $α_{D}$ ).

For the 1FSV protein, we performed 2 ns MD simulations in vacuum for all three models, starting from the crystallographic PDB structure. The 1FSV protein poses a significant challenge due to its 10 positively and 5 negatively charged groups. As shown in Fig. 10, both MACELES-OFF and MACE-OFF24(M) began folding the protein into a more compact conformation, characteristic of gas-phase behavior. However, their trajectories diverged over time. MACELES-OFF exhibited more stable dynamics, remaining close to AMBER ff99SB-ILDN [72], with a root-mean-square deviation (RMSD) of $5.14 Å$ with respect to the 1FSV PDB structure, just $0.085 Å$ lower than AMBER’s $5.22 Å$ . In contrast, MACE-OFF24(M) showed continued RMSD drift and over-compaction, reaching $4.43 Å$ .

Figure 10. — The root mean square deviation (RMSD) of the 1FSV protein during a 2 ns gas-phase NVT MD simulation at 310 K. Time evolution of RMSD and protein structures compared to the initial PDB structure for AMBER ff99SB (blue), MACEOFF-24(M) (orange), and MACELES-OFF (red).

Further structural analysis confirmed these trends through the calculation of salt bridges and hydrogen bond networks. MACE-OFF24(M) exhibited the highest number of salt bridges (159, +35% vs. PDB) and hydrogen bonds (286, +55%), indicating an over-collapsed structure. By comparison, MACELES-OFF formed 149 salt bridges and 263 hydrogen bonds, values more consistent with AMBER (143 and 258, respectively).

Overall, the results above suggest that MACELES-OFF can provide a good description of the FES of a small peptide solvated in water. Comparing MACELES-OFF and MACEOFF for the folding of 1FSV in vacuum, long-range electrostatics play a critical role in governing the dynamics and structural stability of the protein. These findings underscore the importance of including long-range interactions in simulations of biomolecular systems.

V. DISCUSSION

We present LES as a “universal” augmentation framework for adding long-range electrostatics to short-range MLIPs. It is universal in three ways: First, LES can work together with many short-ranged MLIPs in an architecture-agnostic way as long as there are local invariant features for atoms. We patched LES to MACE, NequIP, CACE, Allegro, UMA and MatGL, and it is our ongoing commitment to include LES in other MLIPs, making it more widely accessible. Interestingly, LES works well with the CACE models with three-body interactions $ν = 2$ , which are mathematically equivalent to the earlier generation of MLIPs (e.g., HDNNPs [28], Gaussian Approximation Potentials [29], and DeepMD [6]). This suggests that LES will also complement well those lower-body-order MLIP methods. The LES-augmented HDNNPs would indeed be quite similar to the third-generation HDNNPs (3G-HDNNPs) [9], except that not explicitly trained on partial DFT charges. The fact that 3G-HDNNPs have poor performance in many systems [9] may be attributed to the explicit charge training, rather than inferring from energy and forces.

Second, the LES augmentation is useful across a wide range of systems. Across the three benchmarks (bulk water, dipeptides, and ${Au}_{2}$ on $MgO (001)$ ), the inclusion of LES yields consistent, architecture-agnostic improvements. Even when the reduction of energy and force RMSEs is modest, LES significantly improves physical observables. For example, in the water and dipeptide set, LES achieves accurate BEC predictions despite training exclusively on energy and forces. BECs are not only well-defined physical quantities that serve as sensitive probes of electrostatics, but are also critical in calculating a range of electrical response properties such as dielectric constants, ionic conductivities, and dipole correlation functions [27].

Third, LES can be scaled to large datasets with chemical diversity and can be used to train universal MLIPs. As a demonstration, the MACELES-OFF model shows better accuracy compared to the short-ranged baseline models, encodes the correct electrostatics, and exhibits better transferability. For instance, the MACE-OFF23 models show an amazingly wide range of accurate predictions across small molecules, biological systems, and molecular crystals, but they are less accurate in predicting the densities and heats of vaporization of molecular liquids. Such shortcomings are attributed to the lack of long-range interactions [45]. Indeed, incorporating LES effectively solved such issues and provides much improved predictions for the liquids. In addition, electrical response properties such as IR spectra are readily available.

Additionally, using LES only requires standard energy and force labels. This is an advantage for scaling up the training of universal MLIPs with large-scale datasets, as most such datasets [52, 53, 73, 74], especially of extended systems, do not contain explicit polarization or charge information. However, the dipole moments of gas-phase molecules are more readily available, and in principle, one can include these in the training of partial charges under the LES framework, by adding a dipole loss function as in AIMNET2 [20], SO3LR [21], and MPNICE [75]. Towards true foundation models with full electrostatic physics, one also needs to address the issue of the field-dependence of atomic charges, which LES currently does not handle, but we have ongoing work along this direction.

To conclude, by incorporating long-range electrostatics and without training on labels other than energy, forces (and stresses), the LES method addresses a core limitation of current MLIPs. The LES library works as a drop-in patch for many short-ranged MLIPs, and demonstrates reliable and consistent performance across a wide range of architectures, chemical system types, and data sizes.

VI. METHODS

A. Implementation

LES is implemented in PyTorch [76] and is easily compatible with MLIP packages also implemented in Py-Torch. In CACE [34], we implemented a LESwrapper, and we note that CACE has its own native LES implementation [25] which yields consistent results. In MACE [33], we added a MACELES model which can be used in the same way as the original MACE model, and all training and evaluation procedures stay the same. For NequIP [32] and Allegro [42], we developed an extension package, NequIP-LES, leveraging the modular structure of the redesigned NequIP framework [49]. One can specify the baseline model through the base_model key and enable BEC inference using the compute_bec flag. The framework automatically augments the baseline architecture by adding charge prediction and Ewald summation modules, while keeping all training and evaluation procedures unchanged. After training, BECs can be computed using either the callback or the model modifier options. In MatGL [36], the inference and training are treated in different modules. We extended the Potential module with calc_BEC = True to include LES and autograd to compute the BECs for the inference. We extended the PotentialLightningModule with include_long_range = True to call Potential with long-range interactions for the training. For UMA [43], we implemented LES wrappers to handle translation between FAIRChem’s data structures and the LES code. Inference, in particular BECs, is handled via an ad hoc module for computing and saving BEC values. These wrappers are used in custom energy-force output heads in the UMA implementation. UMA-LES can be used by simply changing the output head and specifying LES parameters to the model. Training and inference without BECs are entirely unchanged from standard FAIRChem procedures.

B. Details for benchmarks

The training scripts with hyperparameters as well as the trained models for the water, dipeptides, and ${Au}_{2}$ on $MgO (001)$ datasets are included in the SI repository. For the LES augmentation part, we always use the default parameters: $σ = 1 Å, d l = 2 Å$ (which corresponds to a $k$ -point cutoff of $k_{c} = π$ in the Ewald summation).

Fig. 11 shows the relation between force RMSE and BEC RMSE for LES-augmented MLIPs (panels a and b), and the relation between the reduction in force RMSE after LES augmentation and the corresponding BEC RMSE (panels c and d), for water and dipeptide systems.

Table I summarizes the energy differences (see Fig. 4) between wetting and non-wetting configurations for doped and undoped substrates in the ${Au}_{2} -MgO (001)$ system.

Table I.

Energy difference ( $E_{wetting} - E_{non–wetting}$ ) in meV between the wetting and non-wetting configurations for doped and undoped substrates.

MLIP	Hyperparameters	Undoped	Doped
MACE	$r = 5.5 Å, n_{l} = 1$	419.0	419.0
MACE LES	$r = 5.5 Å, n_{l} = 1$	919.3	−56.5
NequIP	$r = 4.0 Å, n_{l} = 2, ℓ = 1$	390.2	390.3
NequIP LES	$r = 4.0 Å n_{l} = 2, ℓ = 1$	973.3	−45.1
CHGNet	$r = 4.5 Å, n_{l} = 1$	411.9	411.9
CHGNet LES	$r = 4.5 Å, n_{l} = 1$	925.5	−103.5
CACE	$r = 5.5 Å, n_{l} = 1, v = 2$	421.5	421.4
CACE LES	$r = 5.5 Å, n_{l} = 1, v = 2$	931.7	−72.2
CACE	$r = 5.5 Å, n_{l} = 1, v = 3$	430.9	430.9
CACE LES	$r = 5.5 Å, n_{l} = 1, v = 3$	930.9	−71.1
Allegro	$r = 5.5 Å, n_{l} = 2, ℓ = 1$	429.3	429.6
Allegro LES	$r = 5.5 Å, n_{l} = 2, ℓ = 1$	934.7	−69.4
DFT reference		934.8	−66.9

Open in a new tab

Fig. 12a compares the DFT tensors from DFT and those predicted by the LES-augmented CACE model ( $r = 4.5 Å n_{l} = 1, ν = 3$ ) for the same 100 water configurations used in Fig. 2b. Fig. 12b repeats the same comparison for the corresponding 2 × 2 × 2 replicated supercells to conduct a convergence test for $k$ in Eqn. 5. The inferred BECs are nearly identical, with the supercell showing slightly improved agreement with DFT ( $R^{2}$ coefficients increasing from 0.976 to 0.985 over all tensor components). These results confirm that LES yields convergent and reliable BEC predictions under periodic boundary conditions.

Figure 12. — Parity plots comparing Born effective charge (BEC) tensors $(Z^{*})$ from the LES-augmented CACE ( $r = 4.5 Å, n_{l} = 1, ν = 3$ ) with RPBE-D3 DFT (Ref. [57]) for 100 bulk water configurations. Main panels compare the diagonal elements of BEC $(Z_{α α}^{*})$ , and the inset shows the off-diagonal elements ( $Z_{α β}^{*}$ with $α \neq β$ ). a: Primitive cell; b: 2 × 2 × 2 replicated supercell.

C. Details on training the MACELES-OFF model

The training scripts as well as the trained models for the MACELES-OFF are included in the SI repository. The hyperparameters are summarized and compared with the original MACE-OFF models in Table II.

Table III summarizes the test errors for each dataset. For the PubChem dataset, 13 outlier configurations were removed from a total of 34,093 test structures in accordance with the MACE-OFF paper [45].

Table III.

Test set root mean square errors (RMSEs) for energy and forces of the MACE-OFF23 (S, M, L) and MACELES-OFF models for organic molecules compared to the underlying DFT reference data [45].

			PubChem	DES370K Monomers	DES370K Dimers	Dipeptides	Solvated Amino Acids	Water	QMugs
MAE	E (meV/at)	23S	1.41	1.04	0.98	0.84	1.60	1.67	1.03
		23M	0.91	0.63	0.58	0.52	1.21	0.76	0.69
		23L	0.88	0.59	0.54	0.42	0.98	0.83	0.45
		LES	0.71	0.54	0.47	0.52	0.82	0.69	0.76
	F (meV/ $Å$ )	23S	35.68	17.63	16.31	25.07	38.56	28.53	41.45
		23M	20.57	9.36	9.02	14.27	23.26	15.27	23.58
		23L	14.75	6.58	6.62	10.19	19.43	13.57	16.93
		LES	17.24	7.65	7.48	11.46	18.37	12.54	20.11
RMSE	E (meV/at)	23S	2.74	1.47	1.51	1.26	1.98	2.07	1.28
		23M	2.02	0.90	0.91	0.85	1.55	0.99	0.89
		23L	2.48	0.84	0.87	0.70	1.32	0.99	0.58
		LES	1.45	0.72	0.72	0.69	1.01	0.90	0.94
	F (meV/ $Å$ )	23S	61.83	26.15	28.48	36.97	53.55	39.33	62.46
		23M	40.51	14.31	16.81	22.25	32.19	21.40	36.73
		23L	33.34	10.27	12.81	16.19	26.91	18.78	27.17
		LES	35.26	11.70	14.42	17.62	25.31	17.08	32.57

Open in a new tab

Reference Born effective charges (BECs) and dipole moments for the MACELES-OFF were computed in PySCF [78] using the $ω B 97 m-D 3 (BJ)$ functional in the Def2SVP basis set. Code for calculating BECs can be found in the infrared module of the properties extension of PySCF https://github.com/pyscf/properties.git.