New and notable perspective
Molecular simulations can provide rationalization and understanding of experimental results, as well as generate hypotheses and predictions that can be verified. The advantage of studying systems in silico is that one has full control over all system parameters and can access full information about every particle in the system. The drawback is that computational models are just an approximation of the actual system, and one hence needs to be careful with the interpretation of the results. Fully atomistic models (Fig. 1 a) represent each atom as an individual particle and hence provide the highest level of resolution. However, typical fully atomistic simulations study timescales on the order of microseconds. Many biologically relevant molecular interactions happen on timescales of seconds or longer, which still remain out of reach of fully atomistic approaches. Additionally, the force field parameterization of these models is an ongoing effort, where fine-tuning to a specific property can cause artifacts and errors in the description of other system properties (1).
Figure 1.
Examples of coarse graining. (a) A fully atomistic representation of a DNA duplex. (b) A nucleotide-level coarse-grained model, where one particle in simulation represents entire nucleotide (with multiple interactions sites per particle). (c) Higher-level coarse-graining, where one red particle in the bead chain represents several basepairs in a DNA duplex and a blue sphere represents coarse-grained proteins.
During the past few years, the use of AI techniques has seen a rapid adoption in biology and chemistry, driven by AlphaFold and its ability to predict with high accuracy three-dimensional conformations of folded proteins based on their sequence (2). These tools keep having a transformative impact and great promise for structural biology, but they have not displaced molecular simulations, as their predictions only provide a static picture of a 3D complex, not the folding kinetics or details of interaction dynamics.
So what can we do about systems that are too large for fully atomistic simulations and, at the same time, do not have enough data to train black-box-like deep neural network approaches or systems where we are interested in their assembly kinetics? For a long time, the molecular simulation community has been using coarse-grained models (Fig. 1, b and c) to address such problems. These models coarse grain the representation of the studied system. For example, in the case of DNA, they can represent an entire group of nucleotide basepairs as a single particle. The coarse-graining process further involves parameterization of the interactions between particles. Generally, two approaches are applied. In the so-called “bottom-up” approach, the interactions are parameterized to higher-resolution models, such as fully atomistic simulations, which are typically chosen to reproduce a particular property of the studied systems (e.g., a mean force between groups of atoms). An alternative “top-down” approach aims to design interactions between particles to reproduce certain measured properties of the system (such as persistence length of DNA duplex). The top-down approach does not necessarily make a direct connection to microscopic properties of the studied system, such as having a correspondence between position of the coarse-grained beads and respective atoms in the studied system. Making a good coarse-grained model that is computationally tracktable and able to make accurate testable prediction still remains somewhat of an art (at least in part), driven by experience and intuition.
In this issue of Biophysical Journal, de Jager et al. (3) present a new coarse-grained model for DNA-protein interactions, which presents an elegant application of a top-down coarse-graining approach. In their work, they develop a protein-DNA interaction model that is complemented by experimental data, which are used for the model parameterization and validation. In their paper, titled “Exploring protein-mediated compaction of DNA by coarse-grained simulations and unsupervised learning,” the authors study a protein-mediated compaction of DNA by HIV integrase proteins, which have been previously studied experimentally by some of them in (4). The studied system consists of DNA duplex strands of lengths up to nine thousands of basepairs and proteins that can bind to different regions on the DNA duplex. The size of the system, as well as the nature of rare events (integrase proteins diffusing and binding to multiple different DNA segments), clearly prevent the use of fully atomistic modeling, which cannot capture the system size nor the timescales involved. The authors use coarse graining to develop a simple and efficient model that can still capture the relevant physics to describe the dynamics of the studied system. The DNA duplex is represented as a chain of beads parameterized to behave like a worm-like chain, with the persistence length corresponding to the one of a DNA (Fig. 1 c). The integrase tetramer, which can form up to four bonds with the DNA duplex, is modeled as a single spherical particle with short-range attractive potential. They implement a bond-swapping potential, originally introduced for patchy particle simulations by Sciortino (5), which ensures that the total sum of all DNA-protein interactions is, at maximum, four between the integrase and DNA. The bond-swapping potential allows particles to make intermediate states, which have more than four segments bound to them, but the total sum of interaction energies will correspond to four or less. This trick has been previously shown to accelerate system equilibration, as it is possible to swap two bonds more quickly than having to wait for one of them to break off and the other to form. A free parameter of the coarse-grained model is the binding strength between the integrase proteins and the DNA duplex, which is selected to match the protein binding probability that was observed in the experiment. Another free parameter that the simulations are used to explore is the attractive (or repulsive) force between two proteins.
With this new parametrized model, the authors used Monte Carlo simulations to sample possible conformations of compacted DNA chains. Due to the simple interactions potential and coarse representation, CPU simulations are fast enough to equilibrate the system and run multiple different variations of protein-protein interaction strengths and protein concentrations. Depending on the simulation parameters used, they observe qualitatively different patterns of DNA compaction. For their classification, the authors developed an unsupervised clustering algorithm. In particular, the simulations identified two different pathways of DNA compaction, which depend on whether the protein-protein interaction is attractive or not.
Overall, the authors presented a model that is as simple as possible in order to be computationally tracktable and yet capture the underlying properties of the studied system. They were able to make predictive statements, such as distributions of different patterns of compaction and different pathways that depend on the nature of protein-protein interactions, which can be compared to future experiments with different DNA-compacting proteins. Overall, their work relied on the intuition, experience, and insights of the author team, which consisted of a theorist and experimentalists. With greater advances of single-molecule experiments, driven not only by better-resolution imaging of standard techniques, such as atomic force microscopy, transmission electron microscopy, and cryoelectron microscopy, but also advances in superresolution microscopy (6), there will be more experimental data that will be in need to explanation by biophysical models. We can hence expect more multiscale coarse-grained models, such as the one presented here, being developed. It is also likely that the most difficult part of coarse graining, the selection of interaction potentials and their subsequent parameterization, will be at least in part automated, likely adopting tools that have been developed for deep learning approaches such as automatic differentiation (7). The fine example of experimental need-driven coarse-grained model design shown by de Jager et al. warrants their selection for New and Notable in this issue.
Acknowledgments
Declaration of interests
The authors declare no competing interests.
Editor: Filip Lankas.
References
- 1.Kuhrova P., Mlynsky V., et al. Banas P. Improving the performance of the amber RNA force field by tuning the hydrogen-bonding interactions. J. Chem. Theor. Comput. 2019;15:3288–3305. doi: 10.1021/acs.jctc.8b00955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Abramson J., Adler J., et al. Jumper J.M. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.de Jager M., Kolbeck P., Filion L., et al. Exploring protein-mediated compaction of DNA by coarse-grained simulations and unsupervised learning. Biophys. J. 2024 doi: 10.1016/j.bpj.2024.07.023. [DOI] [PubMed] [Google Scholar]
- 4.Kolbeck P.J., de Jager M., et al. Vanderlinden W. HIV integrase compacts viral DNA into biphasic condensates. bioRxiv. 2024 doi: 10.1101/2024.03.15.585256. Preprint at. [DOI] [Google Scholar]
- 5.Sciortino F. Three-body potential for simulating bond swaps in molecular dynamics. Eur. Phys. J. E Soft Matter. 2017;40:3–4. doi: 10.1140/epje/i2017-11496-5. [DOI] [PubMed] [Google Scholar]
- 6.Hellmeier J., Strauss S., et al. Jungmann R. Quantification of absolute labeling efficiency at the single-protein level. Nat. Methods. 2024:1–6. doi: 10.1038/s41592-024-02242-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.King E.M., Du C.X., et al. Brenner M.P. Programming patchy particles for materials assembly design. Proc. Natl. Acad. Sci. USA. 2024;121 doi: 10.1073/pnas.2311891121. [DOI] [PMC free article] [PubMed] [Google Scholar]

