Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2025 May 8;129(20):4895–4903. doi: 10.1021/acs.jpcb.4c08622

Enhanced Exploration of Protein Conformational Space through Integration of Ultra-Coarse-Grained Models to Multiscale Workflows

Fikret Aydin †,*, Konstantia Georgouli , Loïc Pottier , Tomas Oppelstrup , Timothy S Carpenter , Jeremy O B Tempkin , Peer-Timo Bremer , Dwight V Nissley §, Frederick H Streitz , Felice C Lightstone , Helgi I Ingólfsson
PMCID: PMC12105037  PMID: 40339149

Abstract

Computational techniques such as all-atom (AA) molecular dynamics (MD) simulations and coarse-grained (CG) models have been essential to study various biological problems over a wide range of scales. While AA simulations provide detailed insights, they are computationally expensive for capturing dynamics over longer length and time scales. CG approaches, particularly ultra-coarse-grained (UCG) models as considered in this study, have addressed this limitation by simplifying molecular representations, enabling the study of larger systems and longer time scales. This work focuses on the development of UCG models of proteins and their integration into the Multiscale Machine-Learned Modeling Infrastructure (MuMMI) to efficiently sample protein conformations, exemplified by the RAS-RBDCRD protein complex. By employing a combination of essential dynamics coarse graining (EDCG) and heterogeneous elastic network modeling (hENM) with anharmonic modifications, we developed UCG models based on the fluctuations observed in the higher resolution Martini CG simulations. These models allow the accurate sampling of protein configurations and long-range conformational changes. The incorporation of an implicit membrane model further enhanced the exploration of protein–membrane dynamics. Additionally, a novel machine-learning-based backmapping approach was developed to convert UCG structures to Martini CG representations, resulting in improved prediction accuracy. Finally, the integration of UCG models into MuMMI significantly enhances the exploration of protein configurations, offering critical insights into the role of protein dynamics in biological processes.


graphic file with name jp4c08622_0008.jpg


graphic file with name jp4c08622_0006.jpg

Introduction

In the study of protein dynamics and interactions, computational methods, such as all-atom (AA) molecular dynamics (MD) simulations and coarse-grained (CG) modeling, have proven invaluable. AA MD simulations provide atomically detailed trajectories, capturing the motions of the protein residues within an equilibrium configuration. However, their ability to sample long-range dynamics is limited due to the high computational cost associated with simulating large systems over extended time scales. To address this limitation, CG models have emerged as a powerful alternative. By condensing atomic degrees of freedom into simplified “beads” that interact via potential functions, CG models allow for the investigation of larger systems and longer time scales at a reduced resolution. ,

Among CG approaches, ultra-coarse-grained (UCG) models that incorporate essential dynamics coarse graining (EDCG) for the optimal placement of UCG sites and heterogeneous elastic network models (hENMs) for describing interaction potentials between UCG sites have been particularly useful in capturing protein motions. , These types of models and variations of them have been used to investigate many types of proteins, including membrane proteins, and to understand key collective motions of the proteins and enhance their conformational sampling. These models were parameterized from AA MD simulations using harmonic interaction potentials between UCG sites. However, the motions described by hENMs tend to approximate only local fluctuations and do not effectively characterize long-range allosteric and conformational changes, especially those that are not present in the reference AA MD models. To overcome this limitation, a modified hENM has been developed that distinguishes between long-range interdomain interactions, which drive slower global motions, and short-range strong interactions, which maintain structural connectivity. This modification incorporates anharmonicity into long-range interactions, enhancing the model’s ability to sample global conformational landscapes, such as those observed in the dynamics during the activation of proteins.

The importance of CG modeling extends beyond protein dynamics, playing a crucial role in the study of complex biological processes like membrane–protein interactions. Traditional AA simulations, while detailed, are computationally too expensive to capture large-scale behaviors such as protein reorganization due to lipid interactions. CG models address this challenge by reducing the degrees of freedom, thereby decreasing computational costs and enabling the study of systems over biologically relevant time scales. , Multiscale approaches, which integrate AA and CG models, further enhance the ability to study such processes by coupling different levels of resolution within a single simulation framework.

A prime example of how the application of multiscale methods can be used to study complex biological problems is the modeling of the RAS/RAF/MAPK signaling pathway, a critical pathway for cell division and growth, and a prominent target in cancer therapy. Recent advances have leveraged machine learning (ML) to create large multiscale ensembles that efficiently sample protein conformational states and dynamics efficiently. The Multiscale Machine-Learned Modeling Infrastructure (MuMMI), for instance, has demonstrated the power of ML-guided ensembles to explore the complex interactions within the RAS-RBDCRD system, providing new insights into the molecular mechanisms underlying signal activation.

The rearrangement (movement) of protein domains is challenging to sample, even at the CG scale, due to the complexity of their conformational landscape. These large-scale structural transitions often involve overcoming significant energy barriers and accessing rare states that are critical for biological functions. This difficulty has been observed when studying complex systems such as RAS-RBDCRD protein dynamics, even when using an advanced multiscale method such as MuMMI. One of the most critical and time-consuming parts of the MuMMI simulation campaigns is the sampling of the RAS-RBDCRD proteins across a vast number of membrane-specific configurations and equilibrating them into those configurations. Here, we demonstrate novel developments in the UCG modeling, such as the development of UCG models based on fluctuations observed in the Martini CG simulation data, the integration of an implicit membrane model, and a ML-driven backmapping approach for improved conversion of UCG to Martini CG representations, which, all together, enable efficient sampling of protein conformational dynamics, exemplified by the RAS-RBDCRD protein dynamics on the membrane. By employing a combination of UCG modeling, ensemble-based multiscale approaches, and ML techniques, we can gain a deeper understanding of the molecular mechanisms that drive critical biological processes associated with rearrangements of proteins through the efficient sampling of protein configurations and membrane orientational space.

Methods

UCG Models

In this study, we generated UCG models of RAS-RBDCRD proteins by employing EDCG and hENM. EDCG is a systematic approach that defines CG sites predicted on collective protein motions, computed through principal component analysis (PCA) of higher resolution simulation trajectories (e.g., AA MD simulations). The EDCG method was used to automatically determine UCG sites for the RAS-RBDCRD proteins by computing collective protein motions based on long Martini CG simulations obtained from a previous simulation campaign (see Figure A for a representative configuration obtained from a Martini CG simulation). The EDCG and hENM methods typically use carbon-alpha (Cα) atoms of proteins for parameterization based on the AA MD simulations; therefore, here, backbone beads (BB) are used for parameterizing RAS-RBDCRD UCG models (see Figure B for a representative configuration showing the mapping between Martini CG and UCG models) based on Martini CG simulations.

1.

1

Martini CG and UCG representations of RAS-RBDCRD. (A) A simulation snapshot showing RAS-RBDCRD interacting with a lipid bilayer (color code = RAS: green, RBD: orange, CRD: purple, and lipid bilayer phosphates: cyan). (B) UCG mapping generated by the EDCG for the RAS-RBDCRD protein. Large beads correspond to the UCG sites, whereas small beads correspond to the underlying Martini CG beads (only BBs are shown). (B) Spring constant (k) as a function of the equilibrium bond distance (x) for the RAS-RBDCRD hENM model.

The integration of effective interactions between UCG sites, as defined by the EDCG method, was achieved using the hENM approach. Martini CG simulation data obtained from our previous simulation campaign were also used for the creation of effective pairwise harmonic interactions between UCG sites. Initially, all the UCG sites were allowed to interact with each other without imposing any cutoff distance. The interaction potential was formulated as follows at a given pair distance, x

V(x)=(1/2)k(xx0)2

here, k represents the harmonic spring stiffness, computed based on average fluctuations between pairs of Martini CG sites, and x 0 denotes the equilibrium separation between UCG sites, determined by the average COM distance between all pairs of UCG sites. The x 0 and k values for RAS-RBDCRD are presented in Figure C. Given that the hENMs are interconnected through effective harmonic potentials, global conformational changes are inherently restricted during UCG simulations based on hENM interactions. To address this limitation and explore diverse protein configurations, a novel extended hENM introduced in a previous study by Bidone et al. was utilized. This modified hENM incorporates anharmonicity into the effective interactions, enabling the exploration of slower global motions inherent in higher resolution Martini CG simulation dynamics. Incorporating the anharmonicity of long-range interactions into the model was achieved through the application of Morse potentials. The representation of Morse potentials is expressed by the following equation

V(x)=De[1eα(xx0)]2

here, V(x) denotes the Morse potential energy as a function of the interparticle distance x, D e represents the potential well depth, α characterizes the width parameter, x0 signifies the equilibrium bond distance, and the exponential term encapsulates the anharmonic behavior of the Morse potential. This formulation serves to enhance the model’s capacity to capture the long-range interactions within the system, allowing an efficient sampling of conformational space for the RAS-RBDCRD proteins. Three different types of interactions were selected to convert from the harmonic to Morse potential. Some of the long-range interdomain interactions were identified by using the Gaussian Mixture Modeling (GMM) technique. GMM was used to identify eight clusters using the harmonic stiffness and equilibrium distance information, and the cluster having the largest average equilibrium distance between UCG sites was selected for conversion. In addition, the intradomain interactions between nonconsecutive sites of HVR and CRD regions were converted from harmonic to anharmonic as those regions were known to have a greater degree of flexibility and conformational mobility based on the computational and experimental studies. To assess the effect of our approach, we generated a pure hENM and compared the distance distribution between the CRD domain in RAF and the HVR domain in RAS proteins. The distance distributions from the same lengths of UCG simulations show that the use of hENM + GMM and HVR/CRD anharmonic interactions allows better sampling of the minimum and maximum distances (Figure S1A). We note that the choice of conversion is also system-dependent. For example, the previous studies ,, show that RAS-RBDCRD protein in this state does not undergo large conformational changes in terms of overall protein configuration except flexible CRD and HVR domains, therefore our aim was only to increase sampling of the configurations around the equilibrium state in this case, but the method is capable of sampling much larger conformational changes including domain rearrangements. To test this, a more aggressive harmonic-to-anharmonic conversion was performed by selecting four additional clusters of bonds identified by the GMM and the distance distribution between the HVR and CRD domains in the UCG simulations was analyzed. The results show that there is a much bigger shift toward larger HVR-CRD distances when a larger number of harmonic bonds are converted to anharmonic Morse potentials (Figure S1B). The mean squared error (MSE) was calculated for both cases and found to be 1.38 nm2 for the comparison between the regular UCG and UCG with a small number of anharmonic conversions (Figure S1A) and 1.95 nm2 for the comparison between the regular UCG and UCG with many anharmonic conversions (Figure S1B).

Furthermore, an implicit membrane model was developed by incorporating a virtual site and defining interactions between specific UCG sites and the virtual site based on a Morse potential. This potential is used to form a Morse potential wall, describing an implicit membrane. The development of these interactions was carried out via the hENM approach by computing fluctuations between the UCG sites and the virtual site using only distances in the z dimension (axis parallel to the bilayer normal). Two UCG sites (UCG site 8 in RAS and UCG site 24 in RBDCRD, see Table S1 for the definitions of the UCG sites) were selected to define interactions for the Morse wall using the “fix wall/morse” functionality in LAMMPS. In addition, the UCG site 20 (corresponding to the site that includes the farnesyl group) is tethered to itself via a weak harmonic bond to capture the effect of the attachment of protein to the lipid bilayer through the farnesyl group in the Martini CG simulations. It is important to note that the implicit membrane used in the UCG simulations is not lipid-specific, and the membrane is primarily used to keep the protein in realistic configurations with respect to the membrane, such as configurations observed in the Martini CG simulations. On the other hand, MuMMI allows us to efficiently switch between the UCG and Martini CG scales by selecting configurations from the UCG simulations based on target criteria (e.g., CRD–membrane distance) and backmapping them to the Martini CG scale on the fly. This reduces the equilibration time to get the protein into a desired configuration as the UCG simulations are computationally cheap and enable investigation of detailed interactions between protein and lipids at the Martini CG scale.

Overall, the UCG scale provides enhanced sampling of the protein conformations in a realistic range that allows MuMMI to start a broad range of CG simulations. The protein conformation sampling is enhanced via the use of a highly coarse-grained model and inclusion of anharmonic interactions, and a simple implicit membrane model captures the protein orientation with respect to the membrane.

Martini CG

To evaluate if the extended UCG modeling proposed here can be used to sample protein confirmations outside of the initial training data, we used a well-known membrane bound protein complex RAS-RBDCRD, for which extensive CG Martini 2 molecular dynamics simulations already exist. The previously conducted large MuMMI multiscale simulations contain over 34K CG simulations (aggregated nearly 100 ms of simulation time) for a system with an 8 lipid type plasma membrane (PM) mimetic and a range of membrane-bound KRAS4b (KRAS) and/or RAS-RBDCRD proteins. Here, the previous multiscale campaign data was subsampled by selecting only simulation frames with one RAS-RBDCRD protein complex where the CRD is either fully off the membrane (CRD loop distance >4.2 nm from the bilayer center) or fully membrane inserted (CRD loop distance <3.2 nm from the bilayer center). These sets contained ∼47,000 and ∼41,000 simulation frames and are called CRD-unbound and CRD-bound, respectively, and represent the protein confirmations we want to sample between. The RAS-RBDCRD protein complex has 320 amino acids and at the AA, CG, and UCG scale, it has 5255 atoms, 751 beads, and 40 beads, respectively. The CG systems are composed of a 30 nm × 30 nm PM patch, a membrane-associated RAS-RBDCRD protein complex, water, and 150 mM NaCl. The system totals about 140 K CG particles and on a modern GPU system can be simulated with ddcMD using 1GPU and 1CPU at 1 μs/day and about the same simulation rate with recent versions of GROMACS using 1GPU and 10CPUs.

Backmapping Method

Our method for performing the backmapping of all UCG configurations into CG Martini representations is based on a deep learning approach. Specifically, we use a multilayer perceptron (MLP) neural network to learn this transformation in a supervised way, allowing us to obtain a finer version of the input data. This model uses a two-part input (n = 900). The first part, X CG, consists of the coordinates of the 40 UCG beads (n UCG = 120), and the second part, D UCG, of the model input is the upper triangular portion of the distance matrix between UCG beads (n D = 780). The output of this model, X CG, is the Cartesian coordinates of the 751 CG beads (n CG = 2253) corresponding to 320 amino acids of the RAS-RBDCRD protein complex. Our goal is to learn the mappings (X UCG, D UCG) → X CG. We train the MLP model to minimize a prediction of CG position loss measuring the error between the ground truth and predicted CG bead positions (see Figure A). MSE is used for calculating this loss.

3.

3

Description and performance of the backmapping method. (A) Architecture of the MLP neural network model used in the backmapping from UCG to CG. Given the 3D coordinates and distances between the beads in a UCG structure as input, our method predicts its CG representation. (B) The distribution of rmsd in the validation data set following model training. (C) Representative examples of CG structures predicted by the model, with (i) a low rmsd of 1.2 Å, (ii) a mean rmsd of 2.1 Å, and (iii) a high rmsd of 6.9 Å.

The PyTorch API is used for model development. Throughout the training process, 10% of the total data is randomly chosen for model validation. Each input UCG protein structure is standardized per bead to ensure a mean of zero and a standard deviation of one, accounting for variations in the ranges of the bead coordinates. The model is trained until the loss converged (300 epochs) with the batch size of 128 utilizing the Adam optimizer. To accelerate the training process, we use the Horovod framework to enable distributed, data parallel training across 8 GPUs. The input data are randomly and equally distributed over different GPUs. At each epoch, the performance metric is calculated by averaging the prediction of CG position loss across all GPUs.

MuMMI Workflow

The entity responsible for the orchestration of every CG simulation in MuMMI is the workflow manager, a Python script that repeatedly requests structures from the UCG server. The UCG server encapsulates the structure generator, ensuring that the LAMMPS simulations run correctly. Furthermore, the UCG server reads the frames generated by the UCG simulations every couple of seconds to maintain its internal sampling object to date. The workflow manager leverages Maestro a convenient wrapper around the scheduler Flux to schedule jobs on HPC resources. Flux is a modern HPC scheduler which is able to schedule thousands of jobs in a matter of seconds; the UCG server and each CG simulation are Flux jobs. The mechanics behind the workflow management are detailed in previous works using MuMMI. ,

Once MuMMI has generated structures, the workflow runs GROMACS on them for a few iterations to validate them (i.e., make sure these structures are valid in terms of molecular dynamics constraints). The validated structures are then sent to the last stage of the workflow, CG simulations.

The CG simulations module is responsible to create complete simulations based on the initial valid structure and then run GROMACS on the simulation newly created for a given amount of time. Usually, GROMACS requires enough time to simulate approximately 600 ns, which on the Lassen machine at LLNL results in 12 h of computation. The high-performance computing platform is Lassen, a 23 pf supercomputer located at DOE Lawrence Livermore National Laboratory. Lassen has 756 compute nodes, and each node offers 4 NVIDIA V100, two 22-core IBM Power9 with 256GB of memory, and 64 GB of GPU memory.

Results and Discussion

Sampling of Critical Parameters via UCG Simulations

The initial UCG model for the RAS-RBDCRD complex, where RBDCRD are the RAS binding domain (RBD) and the cysteine-rich domain (CRD), two domains part of the RAF1 protein, was developed using Martini CG simulation data obtained from our previous simulation campaign (Figure A,B). Each protein (RAS and RBDCRD) within the RAS-RBDCRD complex encompasses 20 UCG sites (the total number of UCG sites in the RAS-RBDCRD complex is 40), corresponding to approximately seven to nine residues (5–36 CG beads) per UCG site for RAS and RBDCRD, respectively. The mapping between UCG and Martini CG models was provided in Table S1.

The use of hENM and incorporation of Morse potentials allowed us to explore important global protein motions inherent in higher resolution Martini CG simulation dynamics. In addition, the incorporation of an implicit membrane model, implemented using Morse potential walls, enabled the capture of similar protein motions with respect to the membrane as observed in the Martini CG simulations (Figure A). To test if the UCG simulations can capture the behavior observed in the Martini CG simulations, the CRD–membrane distance distribution was selected as a metric to make a comparison between UCG and Martini CG simulations. The CRD–membrane distance is an important parameter that determines the behavior of the RAF proteins (e.g., initial step for RAF activation) and defined as the distance between the center of mass (COM) of the hydrophobic/cationic CRD loop residues and the COM of the bilayer along the bilayer normal in the simulations. The Martini CG simulations used for parameterizing UCG models were composed of two sets corresponding to the CRD-bound (CRD interacts with the membrane) and CRD-unbound (CRD is in solution). The CRD–membrane distance distributions for these sets were given in Figure B. The combined Martini CG data has distances between these two states, not present in Figure B, but here, to represent the more realistic use case of UCG empowered MuMMI (sampling between two endpoint distributions), the distance data was filtered into CRD-bound and CRD-unbound states only. The same distance distribution was obtained from the UCG simulation by calculating the COM distance between the UCG sites corresponding to the hydrophobic/cationic CRD loop residues (UCG sites 34 and 36) and the UCG site corresponding to the farnesyl group (UCG site 20, which is the reference of the implicit membrane) along the bilayer normal and adding an offset of 1.8 nm, which is estimated based on the average COM-z distance between the CRD loops and lipid bilayer in the Martini CG simulations (Figure C). The UCG simulations were found to sample the relevant distances observed in the higher resolution Martini CG simulations and capture the distances between the CRD-bound and CRD-unbound conditions. Moreover, comparing the full CRD–membrane distance distributions from the Martini CG and UCG simulations (Figure S2) demonstrates that the UCG model can capture the full Martini distribution well except the largest distances, above ∼8 nm which are not of specific interest here as the protein complex is located far away from the membrane. It is also important to note that the UCG model can explore distances outside the range observed in the Martini CG training data (e.g., <1.5 nm). It is important to capture configurations corresponding to these low distances as the energy barriers can prevent deep insertion of the CRD domain into the lipid bilayer in the Martini CG simulations. In addition to accurately representing complex systems, UCG simulations are also extremely fast. Martini CG simulations using ddcMD on one GPU and one core can perform up to 1.034 μs/day (∼140 K CG particles with needed membrane and solvent). Experiments show that UCG simulations (implicit membrane and solvent 40 UCG protein particles) using LAMMPS , on one CPU core simulate about 138 μs/day on average, which is ∼134 times faster than a Martini CG simulation using less resources.

2.

2

Comparison of CRD–membrane distance distributions from Martini CG and UCG. (A) UCG representation of RAS-RBDCRD and a schematic for demonstrating the location of the implicit membrane. (B) The CRD–membrane distance distributions obtained from the Martini CG simulations used for parametrizing UCG models. The red and green bars correspond to the CRD-bound (CRD interacts with the membrane) and CRD-unbound (CRD is in solution) cases. (C) The CRD–membrane distance distribution was obtained from a UCG simulation.

Another advantage of the UCG simulations is the ability to sample extended motions due to the smoother free energy landscape. To understand if the UCG layer improves the protein sampling, the distance distribution between the CRD domain in RAF and HVR domain in RAS proteins obtained from a Martini CG simulation and from a UCG simulation (using the Martini CG beads corresponding to the UCG sites in CRD and HVR domains) was compared (Figure S3). The distance distributions show that the use of the UCG simulation increases the sampling of both larger and smaller distances between CRD and HVR domains in comparison to that of the Martini CG simulations, allowing the sampling of both more extended and more compact structures in a much more computationally efficient manner. The incorporation of UCG simulations in the MuMMI framework enables improved protein sampling in the sense that multiple simulation scales are combined to inform each other. The UCG scale accelerates sampling of possible conformers as starting points, and it can generate more extended or compact configurations (e.g., Figure S3); these conformers are validated at the higher resolution and sampled locally by simulating them at the Martini CG scale. After the first UCG-MuMMI iteration, the resulting ensemble of CG simulations is added to the training data of the UCG model used in the next iteration.

Backmapping of UCG to CG via Deep Neural Networks

MLP neural networks are employed to learn pairwise interbead distances, enabling the transformation of UCG representations into more detailed Martini CG representations. As illustrated in Figure B, our model exhibits very good predictive performance. Specifically, the distribution of root-mean-square deviation (rmsd) between the ground truth and predicted CG positions in the validation data indicates that most of the predicted CG beads are reasonably close to their ground truth coordinates with the lowest and mean rmsd of 1.2 and 2.1 Å, respectively. Importantly, the rmsd calculation encompasses all protein beads, including those from the highly variable flexible regions of the protein. Figure C presents three representative predicted CG protein structures at varying rmsd values.

Integration of UCG Models within MuMMI

MuMMI has been extended to integrate UCG models alongside continuum and CG models (Figure ). This integration leverages ML-based importance sampling within an autonomous workflow framework. The framework enables simultaneous evolution of simulations at different scales, continuously refining regions of interest identified at coarser scales and providing feedback to adjust the simulations at finer scales.

4.

4

MuMMI workflow key components. Three modules: structure generator (pink), macro-model (blue), and CG simulations (green). These modules operate in sequence, from the structure generator creating UCG structures to the CG simulations module simulating them. The CG simulations produce feedback files that are consumed by the structure generator to improve its internal sampling process.

The UCG models were developed for the RAS-RBDCRD system using the EDCG method to determine dynamic domains from Martini CG trajectories. Groups of atoms were mapped into CG sites that preserved dynamic domains, capturing functional and interdomain motions of the protein. Intramolecular interactions were initially determined using the hENM technique, with harmonic bonds assigned within RAS-RBDCRD. Some of the harmonic bonds were converted to anharmonic bonds to improve the model’s capacity to sample larger-scale motions. Finally, an implicit membrane was parameterized based on z-distance distributions of UCG sites relative to the membrane. Specific UCG sites were modeled by using multiple Morse walls to simulate their interaction with the membrane. More detailed information on the development of UCG models can be found in the Methods section.

The MUMMI workflow is responsible for enabling the UCG campaign. In the previous simulation campaign, MuMMI was used as an iterative approach to sample conformations between sets A and B and simulate these structures using GROMACS and then leverage the simulated trajectories to improve its internal sampling of the conformational space. The end goal behind the previous MuMMI campaign was to find pathways between those two sets. The MuMMI framework was comprised of three main parts: the structure generator, the macro-model, and the MD code simulating the structures (see Figure ). The structure generator is a pipeline with three main stages: the sampler, the generator, and the validator. The sampler is responsible to sample the conformational spaces between sets A and B to generate new trajectories that MuMMI can explore. In this work, instead of relying on ML models to generate the structure, we leverage UCG simulations to sample configurations between two states in a fast way by taking advantage of the highly coarse-grained model and lack of explicit solvent. MuMMI is running about 20 UCG simulations using LAMMPS , that continuously outputs positions; instead of using an ML model to generate biological structures, we directly sample the positions produced by the UCG simulations. The rest of the pipeline is very similar to the one described in the work performed by Georgouli et al.

UCG Simulation Campaign

The UCG simulation campaign represents a systematic and iterative approach to studying the behavior of RAS-RBDCRD proteins on multiple scales. By integration of UCG and Martini CG simulation techniques, the campaign aims to unravel the intricate dynamics of RAS-RBDCRD proteins interacting with realistic lipid bilayers. Through a combination of advanced simulation techniques, including ML-based backmapping and iterative refinement, the campaign provides a robust framework for capturing biologically relevant interactions and structural details.

Simulation systems comprised RAS-RBDCRD proteins within a mixed lipid bilayer. Lipid compositions were chosen from continuum model simulations using ML importance sampling, while initial Martini protein configurations were generated by backmapping UCG structures. The UCG-incorporated MuMMI simulation campaign produced approximately 90,000 valid initial Martini structures in two rounds.

The UCG model was reparameterized between the two rounds, and a comparative analysis of bond stiffnesses between the rounds demonstrated a shift in the stiffness of certain parts of the RAS-RBDCRD. It is important to note that the aim of the model refinement between the rounds is not to increase the accuracy of the model but to utilize the new training data, started in protein configuration between the training data endpoint ensembles, to capture protein modes/fluctuations which are not observed in the training data and could enable sampling of different protein motions. More specifically, the stiffness of the harmonic bonds between UCG sites 22 (RBD)–31 (RBD) and 22 (RBD)–23 (RBD) significantly increased from round 1 to round 2. These results demonstrate that intramolecular interactions of the UCG model can be tuned due to specific interactions between the protein and lipid bilayer observed in the first round of the campaign (Figure A).

5.

5

The effect of reparameterization of the UCG model on the RAS-RBDCRD behavior at the Martini CG and UCG level. The comparison of (A) spring constants in the RAS-RBDCRD hENM model and (B) the CRD–membrane distance between round 1 and round 2 of the UCG simulations performed in the MuMMI simulation campaign. (C) The comparison of the CRD–membrane distance distribution between rounds 1 and 2 of the Martini CG simulations performed in the MuMMI simulation campaign.

To understand whether the UCG simulations can efficiently sample a relevant range of protein configurations with respect to the membrane, the CRD–membrane distance distribution was analyzed in both rounds of the simulation campaign (Figure B). The range of CRD–lipid bilayer distances was consistent with distributions obtained from Martini CG simulations. Notably, in the second round, the CRD–membrane distances shifted toward slightly smaller values as the CRD further embedded into the membrane. The resulting UCG configurations were sampled randomly based on the CRD–membrane distance distribution to select the configurations to be backmapped into Martini CG representations. It is important to note that the CRD domain starts in solution in the Martini CG simulations and typically spends significant time in solution before interacting with the lipid bilayer. Once the CRD domain is fully embedded in the lipid bilayer, the CRD domain does not go back into solution in the microsecond times scales of the CG simulations due to the high energy barriers. This is the reason why a large peak exists around 1.5–2 nm in the Martini CG simulations (Figure C). On the other hand, UCG models have smoother free energy landscapes, which make the transitions between membrane-bound and -unbound states easier and result in more uniform distance distributions (Figure B). Moreover, the CRD–membrane distance distributions are consistent with the distance distributions obtained from the all-atom MD simulations, showing a distance range of ∼0.5 to ∼8 nm. A previous experimental study also found that the CRD domain is within 3 nm of the lipid headgroups (which corresponds to the distance of ∼0 to ∼5 nm considering a monolayer thickness of ∼2 nm) by demonstrating loss of significant signal intensity for most of the CRD residues with a gadolinium spin-labeled lipid headgroup.

Martini CG simulations were performed through MuMMI to investigate the dynamics of UCG to CG backmapped RAS-RBDCRD proteins interacting with a PM mimetic bilayer composed of eight lipid species. This study aimed to evaluate how iterative feedback between rounds of simulation affects the structural and orientational behavior of the RAS-RBDCRD on the membrane. A key metric of interest was the CRD–membrane distance, which serves as a critical descriptor for understanding the protein’s membrane interactions.

Figure C illustrates the probability distribution of the CRD–membrane distances obtained from Martini CG simulations across the two rounds of the simulation campaign. While the distance distributions between the two rounds appear broadly similar, a subtle but distinct shift toward smaller distance values is evident in round 2. This shift likely reflects refinements introduced by feedback from round 1, underscoring the iterative approach’s ability to capture subtle changes in protein–membrane interactions.

The shift observed in round 2 highlights the importance of iterative refinement in simulations to capture these subtle but significant changes. As protein–membrane systems are highly sensitive to their environment, capturing accurate distance distributions is essential for producing reliable and predictive models. This ensures that the backmapped structures from the UCG simulations are biologically relevant, providing a robust framework for studying membrane-associated proteins at multiple scales.

Conclusions

In this work, we have demonstrated the utility of UCG models that include implicit membranes, built by using EDCG and hENM techniques, for efficiently sampling protein conformational dynamics. The integration of the UCG models within MuMMI further enhances the scope of multiscale simulations. This integration allowed us to significantly increase the sampling efficiency of protein configurations, thereby reducing the equilibration time and computational costs, as exemplified here by sampling of RAS-RBDCRD protein dynamics. The successful backmapping of UCG structures to Martini CG representations via deep learning demonstrates the potential for combining UCG methods with ML techniques to bridge the gap between different scales. The addition of an implicit membrane to the UCG demonstrates that UCG models can be generalized to explore the dynamics of membrane-associated proteins. Our findings underscore the potential of UCG models to serve as a powerful tool for exploring protein dynamics at biologically relevant time scales, offering a balance between computational efficiency and molecular accuracy. The results of this work lay the groundwork for future applications in studying other membrane-associated protein systems and large-scale biological processes.

Supplementary Material

jp4c08622_si_001.pdf (736.1KB, pdf)

Acknowledgments

This work was performed under the auspices of the US Department of Energy (DOE) by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 and under the auspices of the National Cancer Institute (NCI) by Frederick National Laboratory for Cancer Research (FNLCR) under Contract 75N91019D00024. This work has been supported by the NCI-DOE Collaboration established by the US DOE and the NCI of the National Institutes of Health and by Laboratory Directed Research and Development at Lawrence Livermore National Laboratory (24-SI-005). This research used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. For computing time, the authors thank the Advanced Scientific Computing Research Leadership Computing Challenge (ALCC) for time on Summit and the Livermore Institutional Grand Challenge for time on Lassen. For computing support, the authors thank OLCF and LC staff. Release: LLNL-JRNL-871206.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jpcb.4c08622.

  • Comparisons of CRD-HVR distance distributions obtained from the regular UCG simulations and the UCG Morse simulations, CRD–membrane distance distributions obtained from the full Martini CG simulations and UCG simulations, comparison of the CRD-HVR distance distributions obtained from UCG simulations and Martini CG simulations, and UCG to CG bead mapping (PDF)

The authors declare no competing financial interest.

References

  1. Ingólfsson H. I., Lopez C. A., Uusitalo J. J., de Jong D. H., Gopal S. M., Periole X., Marrink S. J.. The Power of Coarse Graining in Biomolecular Simulations. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2014;4:225–248. doi: 10.1002/wcms.1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Saunders M. G., Voth G. A.. Coarse-Graining Methods for Computational Biology. Annu. Rev. Biophys. 2013;42:73–93. doi: 10.1146/annurev-biophys-083012-130348. [DOI] [PubMed] [Google Scholar]
  3. Zhang Z., Lu L., Noid W. G., Krishna V., Pfaendtner J., Voth G. A.. A Systematic Methodology for Defining Coarse-Grained Sites in Large Biomolecules. Biophys. J. 2008;95:5073–5083. doi: 10.1529/biophysj.108.139626. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Lyman E., Pfaendtner J., Voth G. A.. Systematic Multiscale Parameterization of Heterogeneous Elastic Network Models of Proteins. Biophys. J. 2008;95:4183–4192. doi: 10.1529/biophysj.108.139733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bidone T. C., Polley A., Jin J., Driscoll T., Iwamoto D. V., Calderwood D. A., Schwartz M. A., Voth G. A.. Coarse-Grained Simulation of Full-Length Integrin Activation. Biophys. J. 2019;116:1000–1010. doi: 10.1016/j.bpj.2019.02.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Yu A., Pak A. J., He P., Monje-Galvan V., Casalino L., Gaieb Z., Dommer A. C., Amaro R. E., Voth G. A.. A Multiscale Coarse-Grained Model of the Sars-Cov-2 Virion. Biophys. J. 2021;120:1097–1104. doi: 10.1016/j.bpj.2020.10.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Madsen J. J., Sinitskiy A. V., Li J., Voth G. A.. Highly Coarse-Grained Representations of Transmembrane Proteins. J. Chem. Theory Comput. 2017;13:935–944. doi: 10.1021/acs.jctc.6b01076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Atilgan A. R., Durell S. R., Jernigan R. L., Demirel M. C., Keskin O., Bahar I.. Anisotropy of Fluctuation Dynamics of Proteins with an Elastic Network Model. Biophys. J. 2001;80:505–515. doi: 10.1016/S0006-3495(01)76033-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Yang L., Song G., Jernigan R. L.. Protein Elastic Network Models and the Ranges of Cooperativity. Proc. Natl. Acad. Sci. U.S.A. 2009;106:12347–12352. doi: 10.1073/pnas.0902159106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Aydin F., Katkar H. H., Voth G. A.. Multiscale Simulation of Actin Filaments and Actin-Associated Proteins. Biophys. Rev. 2018;10:1521–1535. doi: 10.1007/s12551-018-0474-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Fan J., Saunders M. G., Voth G. A.. Coarse-Graining Provides Insights on the Essential Nature of Heterogeneity in Actin Filaments. Biophys. J. 2012;103:1334–1342. doi: 10.1016/j.bpj.2012.08.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bhatia H., Aydin F., Carpenter T. S., Lightstone F. C., Bremer P. T., Ingolfsson H. I., Nissley D. V., Streitz F. H.. The Confluence of Machine Learning and Multiscale Simulations. Curr. Opin. Struct. Biol. 2023;80:102569. doi: 10.1016/j.sbi.2023.102569. [DOI] [PubMed] [Google Scholar]
  13. Ayton G. S., Noid W. G., Voth G. A.. Multiscale Modeling of Biomolecular Systems: In Serial and in Parallel. Curr. Opin. Struct. Biol. 2007;17:192–198. doi: 10.1016/j.sbi.2007.03.004. [DOI] [PubMed] [Google Scholar]
  14. Enkavi G., Javanainen M., Kulig W., Róg T., Vattulainen I.. Multiscale Simulations of Biological Membranes: The Challenge to Understand Biological Phenomena in a Living Substance. Chem. Rev. 2019;119:5607–5774. doi: 10.1021/acs.chemrev.8b00538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Markesteijn A., Karabasov S., Scukins A., Nerukh D., Glotov V., Goloviznin V.. Concurrent Multiscale Modelling of Atomistic and Hydrodynamic Processes in Liquids. Philos. Trans. R. Soc., A. 2014;372:20130379. doi: 10.1098/rsta.2013.0379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Peter C., Kremer K.. Multiscale Simulation of Soft Matter Systems - from the Atomistic to the Coarse-Grained Level and Back. Soft Matter. 2009;5:4357–4366. doi: 10.1039/b912027k. [DOI] [Google Scholar]
  17. Bhatia, H. ; et al. Generalizable Coordination of Large Multiscale Ensembles: Challenges and Learnings at Scale. The International Conference for High Performance Computing, Networking, Storage and Analysis; ACM, 2021; p 10. [Google Scholar]
  18. Di Natale, F. ; et al. A Massively Parallel Infrastructure for Adaptive Multiscale Simulations: Modeling Ras Initiation Pathway for Cancer. The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, Colorado; ACM: Denver, CO, 2019; p 57. [Google Scholar]
  19. Ingolfsson H. I., Neale C., Carpenter T. S., Shrestha R., López C. A., Tran T. H., Oppelstrup T., Bhatia H., Stanton L. G., Zhang X.. et al. Machine Learning-Driven Multiscale Modeling Reveals Lipid-Dependent Dynamics of Ras Signaling Proteins. Proc. Natl. Acad. Sci. U.S.A. 2022;119:e2113297119. doi: 10.1073/pnas.2113297119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Ingolfsson H. I.. et al. Machine Learning-Driven Multiscale Modeling: Bridging the Scales with a Next-Generation Simulation Infrastructure. J. Chem. Theory Comput. 2023;19:2658–2675. doi: 10.1021/acs.jctc.2c01018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. López C. A.. et al. Asynchronous Reciprocal Coupling of Martini 2.2 Coarse-Grained and Charmm36 All-Atom Simulations in an Automated Multiscale Framework. J. Comput. Theor. Chem. 2022;18:5022–5045. doi: 10.1021/acs.jctc.2c00168. [DOI] [PubMed] [Google Scholar]
  22. Nguyen K., López C. A., Neale C., Van Q. N., Carpenter T. S., Di Natale F., Travers T., Tran T. H., Chan A. H., Bhatia H.. et al. Exploring Crd Mobility During Ras/Raf Engagement at the Membrane. Biochem. J. 2022;121:3630–3650. doi: 10.1016/j.bpj.2022.06.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Shrestha R., Carpenter T. S., Van Q. N., Agamasu C., Tonelli M., Aydin F., Chen D., Gulten G., Glosli J. N., López C. A.. et al. Author Correction: Membrane Lipids Drive Formation of Kras4b-Raf1 Rbdcrd Nanoclusters on the Membrane. Commun. Biol. 2024;7:445. doi: 10.1038/s42003-024-06061-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Marrink S. J., Risselada H. J., Yefimov S., Tieleman D. P., de Vries A. H.. The Martini Force Field: Coarse Grained Model for Biomolecular Simulations. J. Phys. Chem. B. 2007;111:7812–7824. doi: 10.1021/jp071097f. [DOI] [PubMed] [Google Scholar]
  25. Ingólfsson H. I.. et al. Capturing Biologically Complex Tissue-Specific Membranes at Different Levels of Compositional Complexity. J. Phys. Chem. B. 2020;124:7819–7829. doi: 10.1021/acs.jpcb.0c03368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Zhang X., Sundram S., Oppelstrup T., Kokkila-Schumacher S. I. L., Carpenter T. S., Ingólfsson H. I., Streitz F. H., Lightstone F. C., Glosli J. N.. Ddcmd: A Fully Gpu-Accelerated Molecular Dynamics Program for the Martini Force Field. J. Chem. Phys. 2020;153:045103. doi: 10.1063/5.0014500. [DOI] [PubMed] [Google Scholar]
  27. Abraham M. J., Murtola T., Schulz R., Páll S., Smith J. C., Hess B., Lindahl E.. Gromacs: High Performance Molecular Simulations through Multi-Level Parallelism from Laptops to Supercomputers. SoftwareX. 2015;1–2:19–25. doi: 10.1016/j.softx.2015.06.001. [DOI] [Google Scholar]
  28. Kingma, D. P. ; Ba, J. . Adam: A Method for Stochastic Optimization. 2014, arXiv:1412.6980 [Google Scholar]
  29. Sergeev, A. ; Del Balso, M. . Horovod: Fast and Easy Distributed Deep Learning in Tensor-Flow. 2018, arXiv:1802.05799 [Google Scholar]
  30. Di Natale, F. Maestro Workflow Conductor; USDOE National Nuclear Security Administration (NNSA), Web, 2017. [Google Scholar]
  31. Ahn, D. H. ; et al. Flux: Overcoming Scheduling Challenges for Exascale Workflows. In IEEE/ACM Workflows in Support of Large-Scale Science (WORKS), Dallas, Texas; IEEE: Dallas, TX, 2018; pp 10–19. [Google Scholar]
  32. Thompson A. P., Aktulga H. M., Berger R., Bolintineanu D. S., Brown W. M., Crozier P. S., int Veld P. J., Kohlmeyer A., Moore S. G., Nguyen T. D.. et al. Lammps - a Flexible Simulation Tool for Particle-Based Materials Modeling at the Atomic, Meso, and Continuum Scales. Comput. Phys. Commun. 2022;271:108171. doi: 10.1016/j.cpc.2021.108171. [DOI] [Google Scholar]
  33. Plimpton S.. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comput. Phys. 1995;117:1–19. doi: 10.1006/jcph.1995.1039. [DOI] [Google Scholar]
  34. Georgouli K.. et al. Generating Protein Structures for Pathway Discovery Using Deep Learning. J. Chem. Theory Comput. 2024;20:8795–8806. doi: 10.1021/acs.jctc.4c00816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Aydin F., Courtemanche N., Pollard T. D., Voth G. A.. Gating Mechanisms During Actin Filament Elongation by Formins. eLife. 2018;7:e37342. doi: 10.7554/elife.37342. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

jp4c08622_si_001.pdf (736.1KB, pdf)

Articles from The Journal of Physical Chemistry. B are provided here courtesy of American Chemical Society

RESOURCES