Abstract
Coarse-grained (CG) modeling enables molecular simulations to reach time and length scales inaccessible to fully atomistic methods. For classical CG models, the choice of mapping, that is, how atoms are grouped into CG sites, is a major determinant of accuracy and transferability. At the same time, the emergence of machine learning potentials (MLPs) offers new opportunities to build CG models that can in principle learn the true potential of the mean force for any mapping. In this work, we systematically investigate how the choice of mapping influences the representations learned by equivariant MLPs by studying liquid hexane, amino acids, and polyalanine. We find that when the length scales of bonded and nonbonded interactions overlap, unphysical bond permutations can occur. We also demonstrate that correctly encoding species and maintaining stereochemistry are crucial, as neglecting either introduces unphysical symmetries. Our findings provide practical guidance for selecting CG mappings compatible with modern architectures and guide the development of transferable CG models.


Introduction
Molecular dynamics (MD) simulations have become an indispensable tool in chemistry, biology, and materials science, offering atomistic insights into the behavior of complex systems. However, a persistent challenge is the vast range of time and length scales governing molecular phenomena. Many processes, such as protein folding or polymer dynamics, occur in microseconds or longer, far exceeding the time scales accessible to conventional all-atom simulations. , To bridge this gap, coarse-graining (CG) methods can be used. CG models simplify the system by grouping atoms into fewer interaction sites, or “beads”, thus reducing the effective degrees of freedom. This allows for significantly larger and longer simulations. The central goal of CG models is to preserve the essential structural and thermodynamic properties of the original atomistic system. −
When developing a CG model, two key questions typically arise: (1) What functional form should be used to represent the CG potential and (2) What mapping scheme should be adopted to define how atoms are grouped into beads? In atomistic systems, only the choice of a potential functional form is relevant, and extensive work has focused on improving its accuracy. In the past decade, attention has been mostly focused on machine learning potentials (MLPs). , Their success is based on a paradigm shift that also occurred in other domains, where MLPs learn interactions directly from data, compared to hand-crafted interaction terms of classical potentials. ,
Many modern MLPs incorporate physical symmetries into the model architecture. In particular, E(3)-equivariant neural networks enforce rotational, translational, and reflection symmetries inherent to molecular systems. Architectures such as NequIP, Allegro and MACE that were built on these principles have demonstrated exceptional accuracy and data efficiency. The application of equivariant MLPs to CG modeling is a natural and promising extension. − An early study of CG liquid water indicates that equivariant MLPs can drastically reduce the amount of training data needed, performing reasonably well with as little as a single reference frame. While single-bead mappings are trivial, it is unclear how more general mapping choices affect the learned representation. In practice, many existing CG-MLPs still rely on priors, i.e., classical energy terms, such as harmonic bond or angle potentials, to maintain molecular connectivity and physical accuracy. − This practice reintroduces the manual heuristics that atomistic MLPs were designed to avoid. ,
In this study, we investigate how modern E(3)-equivariant machine learning potentials can be applied to coarse-graining without the use of any priors. Using the MACE architecture as a representative model, we perform a systematic study to evaluate how the choice of mapping, species encoding and model parameters influence the stability and learned representation of the model. We compare these results with classical CG potentials to assess whether the learned models provide a more accurate description of CG energy landscapes. We evaluated three distinct systems of increasing complexity: liquid hexane, single amino acids, and a 15-mer polyalanine. We validate key findings by additionally testing them with NequIP.
Our results show that the mapping choice can significantly influence the representation that equivariant MLPs learn. In low-resolution mappings of amino acids, we observe symmetries in the free energy surface (FES). We found that these are caused by enantiomerization or ambiguous species encoding. We show that such symmetries propagate from symmetric dihedrals in single amino acids to incorrect secondary structure formation in larger peptides. Finally, we show that bond permutations can occur when bonded and nonbonded length scales overlap, such as in the two-site hexane or C α polyalanine model. Together, these findings highlight that while equivariant MLPs offer remarkable flexibility and data efficiency, , the preservation of a faithful representation critically depends on the CG mapping.
Methods
Mapping in Coarse-Grained Modeling
In CG modeling, the degrees of freedom of an atomistic system are reduced by grouping atoms into beads. Here we limit the study to mappings, in which each atom contributes to at most one CG bead. We use the center-of-mass (COM) mapping to derive CG positions and forces. Other, more general mapping choices, , as well as optimal mappings, − have also been explored, but remain less prevalent in practice. In the COM mapping, the CG positions are derived as a linear mapping M of the atomistic positions , where N < n. Specifically, the position of CG bead I is defined by the weighted sum of the coordinates of the atoms assigned to that bead (eq ).
| 1 |
Here, denotes the set of atom indices that contribute to the CG bead I. In the COM mapping, the contribution of each atom w Ii to the CG bead I is equivalent to its contribution to the total mass (eq ).
| 2 |
The weights are normalized such that . In the COM mapping, the effective force acting on a CG bead f I is simply the sum of the atomistic forces , (eq ).
| 3 |
CG models are parametrized in a “top-down” or “bottom-up” manner. Bottom-up methods − use a more detailed reference, for example an atomistic simulation or quantum-mechanics, to derive the CG potential, while top-down approaches , aim to reproduce macroscopic observables, such as experimental measurements. Many models also implement a hybrid approach, for example the MARTINI force field, which parametrizes nonbonded interactions via experimental partition coefficients, while bonded terms are derived from classical atomistic simulations. Here we employ force matching, a bottom-up method, to derive the CG potentials. We utilize force matching as it is the standard for training CG-MLPs and directly captures many-body interactions that structure-based methods, like iterative Boltzmann inversion, often miss. Furthermore, force matching avoids the high computational cost of iterative sampling required by strategies such as relative entropy minimization.
Force Matching
Force matching, first introduced by Izvekov and Voth as the multiscale coarse graining (MS-CG) method, optimizes the effective force acting on the CG bead f I by minimizing the least-squared force residual χ2 between the predicted CG and the mapped atomistic forces. ,
| 4 |
Here F I (R, θ) = −∇ I U CG(R, θ) are the predicted forces as a function of the model parameters θ.
Coarse-Grained Potential
Classical Potential
We employ the VOTCA framework to derive classical CG potentials U Classical(R; θ). The general form is a sum of bonded (bonds, angles, dihedrals) and nonbonded interactions, which are parametrized in the form of splines (eq ).
| 5 |
We provide a detailed overview of the parametrized interactions and hyperparameters used in the Supporting Information (SI), Table S1.
Machine Learning Potential
To parametrize an equivariant MLP U MLP(R; θ), we employ the MACE architecture. MACE combines the idea of atomic cluster expansion with message-passing neural networks (MPNNs). In MACE, the messages passed between nodes, in our case CG beads, are expanded using the idea of a hierarchical body order expansion (eq ) over neighbors J
| 6 |
where u 1...ν are learnable functions and σ I is the state of CG bead I in layer l. The total body order of the model depends on the number of layers L and correlation order ν.
We use the chemtrain , framework to train the MACE potentials. The training process directly adjusts the model parameters based on eq via backpropagation. A detailed overview of the setup and model hyperparameters can be found in the SI Table.
Simulations
Reference Simulations
Atomistic reference data was generated using GROMACS in the canonical ensemble (NVT) at 300 K. For liquid hexane, we followed the protocol of Ruehle et al., employing the OPLS-AA force field with a system size of 100 molecules. For peptide systems, we employed the AMBER ff99SB-ILDN force field with the TIP3P water model. All peptide structures were capped with N-terminal acetyl (ACE) and C-terminal N-methyl amide (NME) groups. Production runs were performed for 100 ns (liquid hexane) and 500 ns (peptides). The configurations were sampled uniformly to produce 500,000 training frames.
CG Simulations
CG simulations were performed in the NVT ensemble at 300 K. Classical CG simulations were run in GROMACS using stochastic dynamics (SD). MLP simulations were performed using the JAX-MD engine within the chemtrain framework. A time integration step of 2 fs was used for all CG models. For the implicit solvent baseline of capped amino acids, a time step of 0.5 fs was employed. For capped alanine, we also perform stability tests in a microcanonical (NVE) ensemble. Detailed simulation parameters are provided in the SI.
Structural Analysis of Coarse-Grained Simulations
Radial Distribution Function
To describe nonbonded interactions in the liquid hexane model, we employ radial distribution functions (RDFs). The RDF g A–B (r) captures the probability of finding a particle of species A at a distance r from a particle of species B (eq ). It is defined as the ratio of the local density of A at a distance r from B to the bulk density of A, ρ A
| 7 |
where dN A–B (r) is the average density of A particles found in a spherical shell of thickness dr at a distance r from a central particle B, and ρ A = N A /V is the bulk number density of species A. Bonded atoms/CG beads are excluded for the RDF calculation. ,
Bonded Parameters
For each liquid hexane mapping, we show the bonded order parameter that embodies the most information: the dihedral angle ϕ A–B–B–A for the four-site mapping, the angle θ A–B–A for the three-site mapping, and the bond distance b A–A for the two-site mapping. To compare the conformational space of single capped amino acids mappings, we evaluated the two backbone dihedrals ϕ C ACE–N–C α–C and ψ N–C α–C–N NME . We compare 2D representations of the free energy surface (FES) of both dihedrals. The FES was calculated from the joint probability density P(ϕ, ψ) using the Boltzmann inversion relation F(ϕ, ψ) = −k B T ln P(ϕ, ψ).
Polyalanine Helix
To analyze the helix formation of the polyalanine peptide, we analyzed different order parameters based on the positions of the C α atoms. We adapted the fractional helix content or helicity from Rudzinski et al. This metric iterates over all N hel pairs of C α atoms separated by three bonds, corresponding to the hydrogen-bonding pattern charactersitic of α-helices. For each pair, the distance d ij is computed and scored by how close it is to an optimal distance d 0 = 0.5 nm with variance σ2 = 0.02 nm2. Q hel ranges from 0, completely unstructured (coil or unfolded), to 1, a perfect α helix.
| 8 |
To differentiate left- from right-handed helices, we used the approach by Sidorova et al. This method first constructs vectors between neighboring C α atoms and then sums the mixed product of consecutive triplets of vectors
| 9 |
The sign of χhel determines the handedness or chirality sign of the helix. A positive χhel indicates a right-handed helix, while negative values indicate a left-handed helix. The free energy F(χhel, Q hel) was obtained through Boltzmann inversion from the joint probability density. To evaluate the helix propensity of the different mappings, we ranked all reference simulation frames according to |χhel·Q hel|, selected the 100 frames with the lowest values, and performed 5 ns simulations starting from each configuration.
Simulation Stability
To assess simulation stability, we applied two different methods. For bulk systems (liquid hexane), we consider the conservation of thermal energy, k B T. Specifically, we check whether k B T < 5 kJ/mol (≈2 k B T), where k B is the Boltzmann constant and T = 300 K. If a particular frame exceeds this threshold, that frame and all subsequent frames are excluded. We found that the stability classification was insensitive to the exact threshold value within the range of 5–20 kJ/mol (SI, Tables S6 and S7).
For the capped amino acids and polyalanine system, we adapt the distance-based stability criterion of Fu et al., by checking whether the distance between the bonded atoms deviates by more than 0.05 nm from the equilibrium bond length. The equilibrium bond length is calculated as the mean of the reference simulation for all matching bond types and frames.
We chose two different strategies, as a distance-based metric is not applicable to all mappings of bulk hexane: Bond distances might change drastically, while thermal energy is preserved (see Results). In the case of polyalanine, the distances might also change due to unphysical bond switching; however, the definition of the polyalanine order parameters depends on the correct order of the C α beads. The results of the stability analysis are provided in SI Table S8.
Well-Tempered Metadynamics
To assess the free energy barrier for enantiomerization, we employ well-tempered metadynamics (WTMetaD). Unlike standard metadynamics, which builds a bias potential by adding Gaussians of constant height, WTMetaD ensures convergence by rescaling the Gaussian height according to the accumulated bias (eq ). The history-dependent bias potential V(s, t) acting on a set of collective variables in the CG space s(R) is given by
| 10 |
Here h is the initial Gaussian height, σ i is the width of the i-th collective variable, and γ = T*/T is the bias factor. This corresponds to sampling the collective variables at an effective temperature T* = γT.
Results and Discussion
Liquid Hexane
We first explore liquid hexane, for which classical CG models have been extensively studied. ,− We explore three different mappings: A four-site model, where A-type beads are placed at carbons 2 and 5, and B-type beads are placed at the central carbons 3 and 4. A three-site model, where the two consecutive carbons and their hydrogens form a bead. ,− And finally a two-site model, in which each n-propyl end, or three consecutive carbons and associated hydrogens form a bead. , A visual representation can be found in Figure . For each mapping, we derive a classical potential and a MLP using MACE.
1.

Structural properties of the reference, classical, MLP models for different CG representations of hexane. First row shows bonded population density metrics: Dihedral (four-site model), Angle (three-site model), Bond distance (two-site model). In case of the two-site MLP simulation, we show the nearest neighbor distance instead of bond lengths. The second row shows the RDF of A–A beads. All results show the mean ± 3 standard deviations of 10 × 1000 ps simulations.
Four-Site Model
The results for the four-site model can be seen in the first column of Figure . Overall, both the classical and MLP are able to replicate the shape of the dihedral population. However, the classical potential overestimates the gauche conformation of hexane with a significant increase in sampling around ± 70 deg. The RDF error is also much smaller for the MLP. We further evaluated the Angular Distribution Functions (ADFs) for A–A–A and B–B–B triplets, which confirm that the MLP accurately captures many-body correlations that are misrepresented by the classical potential (SI, Figure S9).
Three-Site Model
For the three-site model, the classical potential fails to sample the A–B–A angle correctly. This mismatch when using force matching with a classical potential has also been observed in other studies of hexane. , The reason why force matching fails for the classical potential is that the sum of angle, bond, and nonbonded terms does not consider out-of-plane forces that are produced by a dihedral reorientation in the atomistic model. The functional form can only capture the forces that lie in the plane in which the angle θ is defined.
The MLP is able to closely match the RDF and angular distribution. We tested the effect of the number of message-passing layers L and correlation order ν on the ability of the MACE model to reproduce the A–B–A angle correctly (Figure a). The total body order of a MACE model is determined by ν and L through ∑ l = 0 ν l . In the simplest case, L = 1 and ν = 1, the total body order is 2 and MACE is unable to capture the angular distribution correctly. Increasing the number of layers or the correlation order allows the MACE model to capture the interaction correctly.
2.

(a) Angular distribution of the hexane three-site model and (b) RDF of the two-site model for different correlation orders ν and number of message-passing layers L. In the two-site liquid hexane model, the length scales of bonded and nonbonded interactions overlap. Results show the mean ± 3 standard deviations of 10 × 1000 ps simulations.
Two-Site Model
The classical potential performs well in reproducing both bonded and nonbonded interactions of the two-site mapping. For the MLP an interesting behavior was observed: the bond partners switch over the simulation. This means, that an analysis of the bond lengths based on the initial bond list leads to a broadening and diffuse distribution (SI, Figure S1). If instead the closest neighbor of each bead is used for the bond length analysis, the distribution closely matches the reference (Figure ). The impact of the bond switches is also visible in the RDF, where partners based on the initial bond list are excluded. Two clear peaks are visible at 0.35 and 0.4 nm, an artifact of the newly bonded partners. We tested the influence of ν and L on this behavior but could not find any dependence (Figure b). We also tested the two-site hexane model with NequIP, observing the same result (SI, Figure S1). While the trends in Figure b might suggest that heavier models could alleviate the bond swaps, extended 10 ns simulations using an even heavier model confirm that increased model expressivity only delays the onset of the behavior rather than resolving the underlying mapping ambiguity (SI, Figure S10).
Model Parameters
Lastly, we evaluate how the cutoff radius r cut and the correlation order ν affect the accuracy, computational performance, and stability of the MACE potential (Table ). The computational cost of our models is comparable to other CG-MLPs models. Overall, increasing either the correlation order or the cutoff radius reduces both RDF and force errors. Note that force errors can only be compared between models using the same mapping, as the mapping itself introduces an irreducible noise. A larger cutoff radius also substantially improves stability, though at the expense of higher computational cost. Similarly, reducing model complexity improves speed; however, classical CG models remain over an order of magnitude faster, achieving around 1600, 2130, and 3470 ns/d for the four-, three-, and two-site models, respectively.
1. Comparison of a 2-Layer MACE Model with Different Correlation Orders ν and Average Number of Neighbors N avg ,
Results are based on 50 × 1000 ps simulations. Green highlighting indicates better performance (arrows indicate direction: ↑ higher is better, ↓ lower is better). “–” indicates that not enough sample statistics were available. “Force” denotes the Mean Absolute Error (MAE) of the force components evaluated on the validation set (50,000 frames).
Units: Force in meV/Å, stability in ps, g A–A, g A–B, g B–B in 10–3. Speed is listed in ns/d, based on a single 1000 ps simulation. Subscripts show standard deviations, if applicable. * signifies the model shown in Figure .
Capped Amino Acids
In recent decades, many different CG representations of amino acids have been developed for peptide and protein-related simulations. Mappings can range from near-atomistic representations that only remove hydrogens but preserve heavy-atom movements, to single bead mappings, in which each residue is represented by one bead, for example the C α-atom. First, we discuss the widely studied capped l-alanine system (alanine dipeptide). This system has been extensively studied using different machine learning potentials, ,,− as well as more recently generative models. , We explore nine different mappings, ranging from an atomistic implicit solvent model to five-bead mappings, which preserve only the central backbone dihedrals. An overview of the mappings can be seen in Figure . Fully explicit solvent MLPs were not considered as they can easily get more computationally expensive than the underlying classical atomistic reference, which would negate the efficiency gains sought through CG modeling. Multiscale or adaptive resolution schemes remain potential alternatives for future work. − For each mapping, we train a MACE model, and analyze the Ramachandran plot of the backbone dihedrals. We also perform 100 × 100 ps NVE simulations with different time steps to analyze simulation stability (Supporting Information).
3.
Results of the capped alanine CG simulations with different mappings. The top and bottom rows present the high- and low-resolution mappings, respectively. Each mapping includes a Ramachandran plot derived from 100 × 5 ns simulations.
High-Resolution Mappings
Among the high-Resolution mappings considered, the implicit solvent mapping retains all atoms of the capped alanine while removing the solvent molecules that make up the bulk of the system. Further simplification is achieved through Heavy and United Atom mappings, which either drop or merge hydrogens with heavy atoms, respectively. These high-resolution approaches are widely adopted because they preserve essential atomistic dynamics, and therefore allow direct comparisons with fully atomistic models. , All high-resolution mappings are capable of accurately modeling the backbone dihedrals (Figure ). In the NVE simulations, the implicit solvent model remains stable up to time steps of 0.5 fs, while the United and Heavy Atom model can be run safely up to 3 fs.
Low-Resolution Mappings
We further investigated low-resolution mappings, which only preserve the atoms essential to the backbone dihedrals, namely the five backbone atoms C ACE–N–C α–C–N NME in the “Core” mapping, and additionally the C β atom in the “Core Beta” mapping. These mappings are frequently adopted as they define the minimal mapping necessary to capture the essential conformational landscape of the protein backbone. ,, For both the Core and Core Beta mapping we test three different species assignments for the CG beads: The atomistic element, a unique species for each bead (Map II), and a single species for all beads (Single).
The Core mappings show a clear point symmetry around the center (Figure column 2). This indicates that the CG molecule during the simulation freely switches between the l- and d-enantiomer. Since both the C α-hydrogen and the entire side chain are removed, the formal definition of chirality is lost, and a transition between enantiomers becomes an unhindered rotation of the backbone dihedrals. Adding the C β atom resolves this issue, and the CG molecule stays in the original l-enantiomer. We observed no noticeable improvement in terms of structural accuracy or stability when using the more detailed species assignment of Map II. A more subtle symmetry was observed when using a single species for all beads. In this case, the model fails to capture the direction of the molecule, which results from an indistinguishability of local environments. We provide these findings, together with evaluations on three other amino acids in the SI Figures S4 and S11.
Chiral Inversion
Lastly, we show that CG amino acid models can undergo chiral inversion, i.e., they transition between enantiomers, even if only a single atom is removed around the stereocenter. For that, we calculate the free energy barrier for the transition by biasing the improper C α-dihedral using WTMetaD. In all cases, we only train on l-alanine. We provide the exact biasing protocol in the SI.
In nature, the direct transition between enantiomers is physically blocked by a high-energy planar transition structure. Consequently, enantiomerization typically requires reactive intermediates. , The atomistic (implicit solvent) MLP accurately describes this behavior, and we were unable to observe enantiomerization up to 18 kcal/mol. We find that further increasing the bias factor renders the simulation unstable before any transition can be observed. Classical atomistic force fields would exhibit a similarly insurmountable barrier, as chiral stability is enforced by the combined constraints of harmonic angular potentials or explicit improper dihedral terms.
By removing one atom around the chiral center, in this case the C α-hydrogen, enantiomerization becomes accessible (Figure a). For the Heavy Atom and Core Beta mappings, we find a transition barrier of approximately 13 kcal/mol. The United Atom mapping presents a slightly larger barrier of 16 kcal/mol. This increase arises because merging heavy atoms with their hydrogens shifts the bead COM away from the stereocenter, thereby stabilizing the initial configuration. This also slightly shifts the minima of the improper dihedral.
4.
(a) Determined free energy barrier for enantiomerization in atomistic and coarse grained MLPs using WTMetaD along the improper C α-dihedral. (b) Mechanism of chiral inversion with high-energy planar transition state.
The calculated free energy barriers are very high compared to the thermal energy available of ∼0.6 kcal/mol at 300 K, making transitions very unlikely. At this barrier height, transitions take on the order of micro- to milliseconds. However, increasing the temperature would drastically increase the transition rate, further restricting the state point dependence of the CG potential.
Helix Formation of Polyalanine
As a last experiment, we extend the low-resolution mappings from capped alanine to a capped 15-mer alanine. Polyalanine systems are good examples of peptides that spontaneously form α helices. ,− While naturally occurring L-amino acid peptides form right-handed helices (Figure a), it has been experimentally shown that a d-alanine peptide forms the mirror image, a left-handed helix.
5.
(a) Chain of d-Alanine forms left-handed helices, while the naturally occurring l-Alanine forms right-handed helices. (b) Helicity and handedness of a 500 ns reference simulation of a capped 15-mer l-alanine.
We tested three different mappings: the Core and Core Beta mapping from the capped alanine example, and a C α mapping, which only preserves the C α atom of each alanine residue. The C α mapping is a popular choice for modeling large protein systems. ,− In the C α mapping, we gave each C α a different species. We also performed experiments on three other possible choices: a single species, alternating species, and a symmetric species arrangement. Using unique or alternating species causes unphysical bond switches and ultimately instabilities. We provide the results of all C α mappings in the SI Figure S13.
The helix formation of polyalanine can be described via two order parameters: an index of α-helicity Q hel (eq ) and a chirality/handedness index χhel (eq ). In the reference atomistic simulation, a clear population of partially folded right-handed α helices can be seen (χhel > 0 and Q hel > 0). Although left-handed helices are visible, these are not proper α-helices (Figure b).
When investigating the helicity and handedness of the CG simulations, clear symmetries can be observed (Figure ). The C α mapping is completely symmetric to χhel = 0, which means that the model does not have a preference for helix handedness. The Core mapping also shows increased sampling of incorrect left-handed helices, even though a preference for the true right-handed helices can be observed. This might be because the starting frames were taken from the l-polylanine reference simulation and are thus favorably oriented to form the right-handed helix. The Core Beta mapping captures the correct helix formation.
6.
Helicity and handedness of 100 × 5 ns trajectories based on different CG mappings. Left: C α, where each bead has a unique species embedding. Middle: Core. Right: Core Beta. The top row shows a graphical representation of the ideal helix and per residue mapping.
Conclusions
Our systematic evaluation of CG models for liquid hexane, amino acids, and polyalanine reveals that the interplay between mapping and model architecture greatly impacts the learned representation of equivariant MLPs. Classical potentials rely on fixed functional forms, which work well for the two-site hexane model but lack the flexibility to capture the more complex many-body correlations present in the three- and four-site liquid hexane models. Although higher-order interactions can in principle be included in classical CG force fields, their number grows combinatorially with interaction order, making both functional form design and parametrization difficult. As a result, most approaches either truncate the expansion at three-body terms or project higher-order many-body effects onto effective pairwise interactions. , In contrast, MLPs overcome this barrier by learning many-body interactions directly from data. , The MLP performs well for the three- and four-site mappings of hexane, accurately recovering both bonded and nonbonded interactions.
Because MLPs do not include explicit topological information, they cannot distinguish bonded neighbors from nonbonded neighbors. In atomistic systems, these interactions are naturally separable because of their different length scales. In contrast, for CG representations such as the two-site liquid hexane or polyalanine C α model, we showed that these length scales overlap. In these cases, the model fails to distinguish particles, which leads to unphysical bond permutations and ultimately instabilities, in case of the polyalanine C α model.
We showed that high-resolution mappings for amino acids can accurately capture atomistic backbone dynamics. However, when further coarsening the representation, we observed spurious symmetries. While equivariant MLPs can distinguish between enantiomers through parity-sensitive features, they will assign the same energy to both mirror images if the output is constrained to a scalar energy. In atomistic systems, this is a correct and desired symmetry, since the transition between enantiomers is correctly modeled with a practically infinite energy barrier. However, we showed that upon removal of the C α-hydrogen, enantiomerization becomes accessible through a high-energy transition state. If additionally the side chain is removed, the molecule loses its formal chirality and a symmetric FES is obtained. Additionally, bead species have to be chosen carefully, as indistinguishability of local environments can lead to further symmetries.
We confirmed key findings with NequIP, a different E(3)-equivariant MLP. Our findings likely also extend to invariant graph neural networks, such as SchNet or DimeNet, which rely on up to three-body scalar invariants (distances and angles). These models are blind to chirality, making them susceptible to the enantiomerization we observed. , Finally, bond permutations are likely to affect any architecture that constructs neighborhoods purely based on geometric information, without topological information.
Overall, equivariant MLPs outperform classical potentials regarding expressivity and data efficiency, theoretically allowing them to learn the potential of mean force for any mapping. In practice, however, our results show that the applicability is limited by mapping-induced artifacts. In scenarios where topological conservation is secondary, such as the liquid hexane systems, MLPs excel at reproducing structural distributions while requiring minimal preparation. However, caution is required when applying MLPs to systems where topology is critical, such as for CG protein models. In these contexts, current MLPs are generally restricted to implicit solvent or Heavy Atom models. Coarser representations, such as the Core and Core Beta mapping, require thought about possible symmetries and overlapping length scales. An even coarser representation, for example the protein C α-model, risks introducing instabilities, overlapping length scales, or spurious symmetries.
Overall, equivariant MLPs outperform classical potentials regarding expressivity and data efficiency, theoretically allowing them to learn the potential of mean force for any mapping. In practice, however, our results show that the applicability is limited by mapping-induced artifacts. In scenarios where topological conservation is secondary, such as liquid hexane systems, MLPs excel at reproducing structural distributions while requiring minimal preparation. However, caution is required when applying MLPs to systems where topology is critical, such as CG protein models. In these contexts, current MLPs are generally restricted to implicit solvent or Heavy Atom models. Coarser representations, from Core/Core Beta mappings to the C α-model, risk introducing overlapping length scales or spurious symmetries that lead to simulation instabilities. We confirmed that incorporating a harmonic bond prior eliminates unphysical bond swaps, and similar terms such as excluded volume or improper dihedrals could likely resolve chiral inversion. However, these priors introduce significant trade-offs, as stiff potentials can limit the maximum stable integration time step and their manual parametrization becomes increasingly difficult as the number of species scales. While priors provide a practical stabilization strategy, our findings highlight the need for architectures that inherently incorporate molecular topology to maintain physical consistency.
Supplementary Material
Acknowledgments
Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. This work was funded by the ERC (StG SupraModel)–101077842.
The code and data supporting this study are publicly available at https://github.com/tummfm/CG-Mapping-Benchmark. The training framework chemtrain is publicly available at https://github.com/tummfm/chemtrain.
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.5c03035.
Additional simulation details; classical and machine-learning model parameters; training and stability analysis; theoretical explanation for symmetries; additional analysis on other single capped amino acids; results of other species choices for C α polyalanine (PDF)
F.G.: Data curation, formal analysis, investigation, methodology, software, writingoriginal draft, writingreview and editing, visualization. J. Z.: Conceptualisation, funding acquisition, resources, supervision, writingreview and editing.
The authors declare no competing financial interest.
Published as part of Journal of Chemical Information and Modeling special issue “Enhancing Coarse-Grained Models through Machine Learning”.
References
- Noé F., Tkatchenko A., Müller K.-R., Clementi C.. Machine learning for molecular simulation. Annu. Rev. Phys. Chem. 2020;71:361–390. doi: 10.1146/annurev-physchem-042018-052331. [DOI] [PubMed] [Google Scholar]
- Kmiecik S., Gront D., Kolinski M., Wieteska L., Dawid A. E., Kolinski A.. Coarse-grained protein models and their applications. Chem. Rev. 2016;116:7898–7936. doi: 10.1021/acs.chemrev.6b00163. [DOI] [PubMed] [Google Scholar]
- Noid W. G.. Perspective: Coarse-grained models for biomolecular systems. J. Chem. Phys. 2013;139:090901. doi: 10.1063/1.4818908. [DOI] [PubMed] [Google Scholar]
- Noid W. G.. Perspective: Advances, challenges, and insight for predictive coarse-grained models. J. Phys. Chem. B. 2023;127:4174–4207. doi: 10.1021/acs.jpcb.2c08731. [DOI] [PubMed] [Google Scholar]
- Kocer E., Ko T. W., Behler J.. Neural network potentials: A concise overview of methods. Annu. Rev. Phys. Chem. 2022;73:163–186. doi: 10.1146/annurev-physchem-082720-034254. [DOI] [PubMed] [Google Scholar]
- Thiemann F. L., O’neill N., Kapil V., Michaelides A., Schran C.. Introduction to machine learning potentials for atomistic simulations. J. Phys.:Condens. Matter. 2025;37:073002. doi: 10.1088/1361-648X/ad9657. [DOI] [PubMed] [Google Scholar]
- Behler J., Csányi G.. Machine learning potentials for extended systems: a perspective. Eur. Phys. J. B. 2021;94:142. doi: 10.1140/epjb/s10051-021-00156-1. [DOI] [Google Scholar]
- Batzner S., Musaelian A., Sun L., Geiger M., Mailoa J. P., Kornbluth M., Molinari N., Smidt T. E., Kozinsky B.. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 2022;13:2453. doi: 10.1038/s41467-022-29939-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Musaelian A., Batzner S., Johansson A., Sun L., Owen C. J., Kornbluth M., Kozinsky B.. Learning local equivariant representations for large-scale atomistic dynamics. Nat. Commun. 2023;14:579. doi: 10.1038/s41467-023-36329-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Batatia, I. ; Kovacs, D. P. ; Simm, G. ; Ortner, C. ; Csanyi, G. . MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. In Advances in Neural Information Processing Systems 35; NIPS, 2022; pp 11423–11436. [Google Scholar]
- Wilson M. O., Huang D. M.. Anisotropic molecular coarse-graining by force and torque matching with neural networks. J. Chem. Phys. 2023;159:024110. doi: 10.1063/5.0143724. [DOI] [PubMed] [Google Scholar]
- Loose T. D., Sahrmann P. G., Qu T. S., Voth G. A.. Coarse-graining with equivariant neural networks: A path toward accurate and data-efficient models. J. Phys. Chem. B. 2023;127:10564–10572. doi: 10.1021/acs.jpcb.3c05928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mirarchi A., Peláez R. P.. et al. AMARO: All heavy-atom transferable neural network potentials of protein thermodynamics. J. Chem. Theory Comput. 2024;20:9871–9878. doi: 10.1021/acs.jctc.4c01239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J., Olsson S., Wehmeyer C., Pérez A., Charron N. E., De Fabritiis G., Noé F., Clementi C.. Machine learning of coarse-grained molecular dynamics force fields. ACS Cent. Sci. 2019;5:755–767. doi: 10.1021/acscentsci.8b00913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thaler S., Stupp M., Zavadlav J.. Deep coarse-grained potentials via relative entropy minimization. J. Chem. Phys. 2022;157:244103. doi: 10.1063/5.0124538. [DOI] [PubMed] [Google Scholar]
- Durumeric A. E., Charron N. E., Templeton C., Musil F., Bonneau K., Pasos-Trejo A. S., Chen Y., Kelkar A., Noé F., Clementi C.. Machine learned coarse-grained protein force-fields: Are we there yet? Curr. Opin. Struct. Biol. 2023;79:102533. doi: 10.1016/j.sbi.2023.102533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charron N. E., Bonneau K., Pasos-Trejo A. S., Guljas A., Chen Y., Musil F., Venturin J., Gusew D., Zaporozhets I., Krämer A.. et al. Navigating protein landscapes with a machine-learned transferable coarse-grained model. Nat. Chem. 2025;17:1284–1292. doi: 10.1038/s41557-025-01874-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ge Y., Zhu Q., Wang X., Ma J.. Coarse-grained models for ionic liquids and applications to biological and electrochemical systems. Ind. Chem. Mater. 2025;3:383–411. doi: 10.1039/D5IM00021A. [DOI] [Google Scholar]
- Behler J.. Perspective: Machine learning potentials for atomistic simulations. J. Chem. Phys. 2016;145:170901. doi: 10.1063/1.4966192. [DOI] [PubMed] [Google Scholar]
- Unke O. T., Chmiela S., Sauceda H. E., Gastegger M., Poltavsky I., Schutt K. T., Tkatchenko A., Muller K.-R.. Machine learning force fields. Chem. Rev. 2021;121:10142–10186. doi: 10.1021/acs.chemrev.0c01111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Babadi M., Everaers R., Ejtehadi M.. Coarse-grained interaction potentials for anisotropic molecules. J. Chem. Phys. 2006;124:174708. doi: 10.1063/1.2179075. [DOI] [PubMed] [Google Scholar]
- Gay J. G., Berne B. J.. Modification of the overlap potential to mimic a linear site-site potential. J. Chem. Phys. 1981;74:3316–3319. doi: 10.1063/1.441483. [DOI] [Google Scholar]
- Diggins P. IV, Liu C., Deserno M., Potestio R.. Optimal coarse-grained site selection in elastic network models of biomolecules. J. Chem. Theory Comput. 2019;15:648–664. doi: 10.1021/acs.jctc.8b00654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang W., Templeton C., Rosenberger D., Bittracher A., N”uske F., Noé F., Clementi C.. Slicing and dicing: Optimal coarse-grained representation to preserve molecular kinetics. ACS Cent. Sci. 2023;9:186–196. doi: 10.1021/acscentsci.2c01200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang W., Gómez-Bombarelli R.. Coarse-graining auto-encoders for molecular dynamics. npj Comput. Mater. 2019;5:125. doi: 10.1038/s41524-019-0261-5. [DOI] [Google Scholar]
- Noid W. G., Chu J.-W., Ayton G. S., Krishna V., Izvekov S., Voth G. A., Das A., Andersen H. C.. The multiscale coarse-graining method. I. A rigorous bridge between atomistic and coarse-grained models. J. Chem. Phys. 2008;128:244114. doi: 10.1063/1.2938860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- John S. T., Csányi G.. Many-body coarse-grained interactions using Gaussian approximation potentials. J. Phys. Chem. B. 2017;121:10934–10949. doi: 10.1021/acs.jpcb.7b09636. [DOI] [PubMed] [Google Scholar]
- Kock I., Edler T., Mayr S. G.. Growth behavior and intrinsic properties of vapor-deposited iron palladium thin films. J. Appl. Phys. 2008;103:046108. doi: 10.1063/1.2875306. [DOI] [Google Scholar]
- Ercolessi F., Adams J. B.. Interatomic potentials from first-principles calculations: the force-matching method. Europhys. Lett. 1994;26:583. doi: 10.1209/0295-5075/26/8/005. [DOI] [Google Scholar]
- Shell M. S.. The relative entropy is fundamental to multiscale and inverse thermodynamic problems. J. Chem. Phys. 2008;129:144108. doi: 10.1063/1.2992060. [DOI] [PubMed] [Google Scholar]
- White A. D., Dama J. F., Voth G. A.. Designing free energy surfaces that match experimental data with metadynamics. J. Chem. Theory Comput. 2015;11:2451–2460. doi: 10.1021/acs.jctc.5b00178. [DOI] [PubMed] [Google Scholar]
- Reith D., Pütz M., Müller-Plathe F.. Deriving effective mesoscale potentials from atomistic simulations. J. Comput. Chem. 2003;24:1624–1636. doi: 10.1002/jcc.10307. [DOI] [PubMed] [Google Scholar]
- Jin J., Pak A. J., Durumeric A. E., Loose T. D., Voth G. A.. Bottom-up coarse-graining: Principles and perspectives. J. Chem. Theory Comput. 2022;18:5759–5791. doi: 10.1021/acs.jctc.2c00643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Souza P. C. T., Alessandri R., Barnoud J., Thallmair S., Faustino I., Grünewald F., Patmanidis I., Abdizadeh H., Bruininks B. M., Wassenaar T. A.. et al. Martini 3: a general purpose force field for coarse-grained molecular dynamics. Nat. Methods. 2021;18:382–388. doi: 10.1038/s41592-021-01098-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thaler S., Zavadlav J.. Learning neural network potentials from experimental data via Differentiable Trajectory Reweighting. Nat. Commun. 2021;12:6884. doi: 10.1038/s41467-021-27241-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Izvekov S., Voth G. A.. A multiscale coarse-graining method for biomolecular systems. J. Phys. Chem. B. 2005;109:2469–2473. doi: 10.1021/jp044629q. [DOI] [PubMed] [Google Scholar]
- Noid W., Szukalo R. J., Kidder K. M., Lesniewski M. C.. Rigorous progress in coarse-graining. Annu. Rev. Phys. Chem. 2024;75:21–45. doi: 10.1146/annurev-physchem-062123-010821. [DOI] [PubMed] [Google Scholar]
- Rühle V., Junghans C., Lukyanov A., Kremer K., Andrienko D.. Versatile object-oriented toolkit for coarse-graining applications. J. Chem. Theory Comput. 2009;5:3211–3223. doi: 10.1021/ct900369w. [DOI] [PubMed] [Google Scholar]
- Drautz R.. Atomic cluster expansion for accurate and transferable interatomic potentials. Phys. Rev. B. 2019;99:014104. doi: 10.1103/PhysRevB.99.014104. [DOI] [Google Scholar]
- Gilmer, J. ; Schoenholz, S. S. ; Riley, P. F. ; Vinyals, O. ; Dahl, G. E. . Neural Message Passing For Quantum Chemistry; Proceedings of the 34th International Conference on Machine Learning; PMLR, 2017; pp 1263–1272 10.5555/3305381.3305512. [DOI] [Google Scholar]
- Fuchs P., Thaler S., Röcken S., Zavadlav J.. chemtrain: Learning deep potential models via automatic differentiation and statistical physics. Comput. Phys. Commun. 2025;310:109512. doi: 10.1016/j.cpc.2025.109512. [DOI] [Google Scholar]
- Fuchs P., Chen W., Thaler S., Zavadlav J.. chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations. J. Chem. Theory Comput. 2025;21:7550–7560. doi: 10.1021/acs.jctc.5c00996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Abraham M. J., Murtola T., Schulz R., Páll S., Smith J. C., Hess B., Lindahl E.. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1-2:19–25. doi: 10.1016/j.softx.2015.06.001. [DOI] [Google Scholar]
- Rühle V., Junghans C.. Hybrid approaches to coarse-graining using the VOTCA package: liquid hexane. Macromol. Theory Simul. 2011;20:472–477. doi: 10.1002/mats.201100011. [DOI] [Google Scholar]
- Robertson M. J., Tirado-Rives J., Jorgensen W. L.. Improved peptide and protein torsional energetics with the OPLS-AA force field. J. Chem. Theory Comput. 2015;11:3499–3509. doi: 10.1021/acs.jctc.5b00356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lindorff-Larsen K., Piana S., Palmo K., Maragakis P., Klepeis J. L., Dror R. O., Shaw D. E.. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins. 2010;78:1950–1958. doi: 10.1002/prot.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramasubramani V., Dice B. D., Harper E. S., Spellings M. P., Anderson J. A., Glotzer S. C.. freud: A Software Suite for High Throughput Analysis of Particle Simulation Data. Comput. Phys. Commun. 2020;254:107275. doi: 10.1016/j.cpc.2020.107275. [DOI] [Google Scholar]
- Thompson A. P., Aktulga H. M., Berger R., Bolintineanu D. S., Brown W. M., Crozier P. S., In’t Veld P. J., Kohlmeyer A., Moore S. G., Nguyen T. D.. et al. LAMMPS-a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comput. Phys. Commun. 2022;271:108171. doi: 10.1016/j.cpc.2021.108171. [DOI] [Google Scholar]
- Rudzinski J. F., Noid W. G.. Bottom-up coarse-graining of peptide ensembles and helix-coil transitions. J. Chem. Theory Comput. 2015;11:1278–1291. doi: 10.1021/ct5009922. [DOI] [PubMed] [Google Scholar]
- Sidorova A. E., Malyshko E. V., Lutsenko A. O., Shpigun D. K., Bagrova O. E.. Protein helical structures: Defining handedness and localization features. Symmetry. 2021;13:879. doi: 10.3390/sym13050879. [DOI] [Google Scholar]
- Fu, X. ; Wu, Z. ; Wang, W. ; Xie, T. ; Keten, S. ; Gomez-Bombarelli, R. ; Jaakkola, T. . Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations. 2022. arXiv:2210.07237. arXiv.org e-Printarchive. 10.48550/arXiv.2210.07237. [DOI]
- Barducci A., Bussi G., Parrinello M.. Well-tempered metadynamics: a smoothly converging and tunable free-energy method. Phys. Rev. Lett. 2008;100:020603. doi: 10.1103/PhysRevLett.100.020603. [DOI] [PubMed] [Google Scholar]
- Laio A., Parrinello M.. Escaping free-energy minima. Proc. Natl. Acad. Sci. U.S.A. 2002;99:12562–12566. doi: 10.1073/pnas.202427399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Das A., Lu L., Andersen H. C., Voth G. A.. The multiscale coarse-graining method. X. Improved algorithms for constructing coarse-grained potentials for molecular systems. J. Chem. Phys. 2012;136:194115. doi: 10.1063/1.4705420. [DOI] [PubMed] [Google Scholar]
- Bernhardt M. P., Hanke M., van der Vegt N. F.. Stability, speed, and constraints for structural coarse-graining in VOTCA. J. Chem. Theory Comput. 2023;19:580–595. doi: 10.1021/acs.jctc.2c00665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rudzinski J. F., Noid W. G.. Investigation of coarse-grained mappings via an iterative generalized Yvon-Born-Green method. J. Phys. Chem. B. 2014;118:8295–8312. doi: 10.1021/jp501694z. [DOI] [PubMed] [Google Scholar]
- Chakraborty M., Xu J., White A. D.. Is preservation of symmetry necessary for coarse-graining? Phys. Chem. Chem. Phys. 2020;22:14998–15005. doi: 10.1039/D0CP02309D. [DOI] [PubMed] [Google Scholar]
- Thaler S., Doehner G., Zavadlav J.. Scalable Bayesian uncertainty quantification for neural network potentials: promise and pitfalls. J. Chem. Theory Comput. 2023;19:4520–4532. doi: 10.1021/acs.jctc.2c01267. [DOI] [PubMed] [Google Scholar]
- Wang J., Chmiela S., Müller K.-R., Noé F., Clementi C.. Ensemble learning of coarse-grained molecular dynamics force fields with a kernel approach. J. Chem. Phys. 2020;152:194106. doi: 10.1063/5.0007276. [DOI] [PubMed] [Google Scholar]
- Chen W., Görlich F., Fuchs P., Zavadlav J.. Enhanced Sampling for Efficient Learning of Coarse-Grained Machine Learning Potentials. J. Chem. Theory Comput. 2026;22:219–230. doi: 10.1021/acs.jctc.5c01712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Köhler J., Chen Y., Kramer A., Clementi C., Noé F.. Flow-matching: Efficient coarse-graining of molecular dynamics without forces. J. Chem. Theory Comput. 2023;19:942–952. doi: 10.1021/acs.jctc.3c00016. [DOI] [PubMed] [Google Scholar]
- Klein L., Kelkar A., Durumeric A., Chen Y., Clementi C., Noé F.. Operator forces for coarse-grained molecular dynamics. J. Chem. Phys. 2025;163:104111. doi: 10.1063/5.0287366. [DOI] [PubMed] [Google Scholar]
- Zavadlav J., Melo M. N., Marrink S. J., Praprotnik M.. Adaptive resolution simulation of an atomistic protein in MARTINI water. J. Chem. Phys. 2014;140:054114. doi: 10.1063/1.4863329. [DOI] [PubMed] [Google Scholar]
- Zavadlav J., Melo M. N., Cunha A. V., De Vries A. H., Marrink S. J., Praprotnik M.. Adaptive resolution simulation of MARTINI solvents. J. Chem. Theory Comput. 2014;10:2591–2598. doi: 10.1021/ct5001523. [DOI] [PubMed] [Google Scholar]
- Zavadlav J., Podgornik R., Praprotnik M.. Adaptive resolution simulation of a DNA molecule in salt solution. J. Chem. Theory Comput. 2015;11:5035–5044. doi: 10.1021/acs.jctc.5b00596. [DOI] [PubMed] [Google Scholar]
- Das K., Balaram H., Sanyal K.. Amino acid chirality: stereospecific conversion and physiological implications. ACS Omega. 2024;9:5084–5099. doi: 10.1021/acsomega.3c08305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ballard A., Narduolo S., Ahmed H. O., Keymer N. I., Asaad N., Cosgrove D. A., Buurma N. J., Leach A. G.. Racemisation in chemistry and biology. Chem. - Eur. J. 2020;26:3661–3687. doi: 10.1002/chem.201903917. [DOI] [PubMed] [Google Scholar]
- Brooks B. R., Brooks C. L. III, Mackerell A. D. Jr, Nilsson L., Petrella R. J., Roux B., Won Y., Archontis G., Bartels C., Boresch S.. et al. CHARMM: the biomolecular simulation program. J. Comput. Chem. 2009;30:1545–1614. doi: 10.1002/jcc.21287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hazel A., Chipot C., Gumbart J. C.. Thermodynamics of deca-alanine folding in water. J. Chem. Theory Comput. 2014;10:2836–2844. doi: 10.1021/ct5002076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuczera K., Szoszkiewicz R., He J., Jas G. S.. Length dependent folding kinetics of alanine-based helical peptides from optimal dimensionality reduction. Life. 2021;11:385. doi: 10.3390/life11050385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou J., Thorpe I. F., Izvekov S., Voth G. A.. Coarse-grained peptide modeling using a systematic multiscale approach. Biophys. J. 2007;92:4289–4303. doi: 10.1529/biophysj.106.094425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chakrabarti P., Pal D.. The interrelationships of side-chain and main-chain conformations in proteins. Prog. Biophys. Mol. Biol. 2001;76:1–102. doi: 10.1016/S0079-6107(01)00005-0. [DOI] [PubMed] [Google Scholar]
- Milton R. C. d., Milton S. C. F., Kent S. B. H.. Total Chemical Synthesis of a D-Enzyme: The Enantiomers of HIV-1 Protease Show Reciprocal Chiral Substrate Specificity. Science. 1992;256:1445–1448. doi: 10.1126/science.1604320. [DOI] [PubMed] [Google Scholar]
- Tozzini V., Rocchia W., McCammon J. A.. Mapping all-atom models onto one-bead coarse-grained models: general properties and applications to a minimal polypeptide model. J. Chem. Theory Comput. 2006;2:667–673. doi: 10.1021/ct050294k. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yap E.-H., Fawzi N. L., Head-Gordon T.. A coarse-grained α-carbon protein model with anisotropic hydrogen-bonding. Proteins. 2008;70:626–638. doi: 10.1002/prot.21515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mioduszewski Ł., Bednarz J., Chwastyk M., Cieplak M.. Contact-based molecular dynamics of structured and disordered proteins in a coarse-grained model: Fixed contacts, switchable contacts and those described by pseudo-improper-dihedral angles. Comput. Phys. Commun. 2023;284:108611. doi: 10.1016/j.cpc.2022.108611. [DOI] [Google Scholar]
- Larini L., Lu L., Voth G. A.. The multiscale coarse-graining method. VI. Implementation of three-body coarse-grained potentials. J. Chem. Phys. 2010;132:164107. doi: 10.1063/1.3394863. [DOI] [PubMed] [Google Scholar]
- Jin J., Han Y., Pak A. J., Voth G. A.. A new one-site coarse-grained model for water: Bottom-up many-body projected water (BUMPer). I. General theory and model. J. Chem. Phys. 2021;154:044104. doi: 10.1063/5.0026651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonneau K., Lederer J., Templeton C., Rosenberger D., Giambagli L., Müller K.-R., Clementi C.. Peering inside the black box by learning the relevance of many-body functions in neural network potentials. Nat. Commun. 2025;16:9898. doi: 10.1038/s41467-025-65863-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schütt K. T., Sauceda H. E., Kindermans P.-J., Tkatchenko A., Müller K.-R.. Schnet-a deep learning architecture for molecules and materials. J. Chem. Phys. 2018;148:241722. doi: 10.1063/1.5019779. [DOI] [PubMed] [Google Scholar]
- Gasteiger, J. ; Groß, J. ; Günnemann, S. . Directional Message Passing for Molecular Graphs. arXiv:2003.03123. arXiv.org e-Printarchive. 10.48550/arXiv.2003.03123. [DOI]
- Dumitrescu, A. ; Korpela, D. ; Heinonen, M. ; Verma, Y. ; Iakovlev, V. ; Garg, V. ; Lähdesmäki, H. . E (3)-equivariant models cannot learn chirality: Field-based molecular generation. arXiv:2402.15864. arXiv.org e-Printarchive. 10.48550/arXiv.2402.15864. [DOI]
- Adams, K. ; Pattanaik, L. ; Coley, C. W. . Learning 3d representations of molecular chirality with invariance to bond rotations. arXiv:2110.04383. arXiv.org e-Printarchive. 10.48550/arXiv.2110.04383. [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The code and data supporting this study are publicly available at https://github.com/tummfm/CG-Mapping-Benchmark. The training framework chemtrain is publicly available at https://github.com/tummfm/chemtrain.





