Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2023 Nov 13;19(23):8987–8997. doi: 10.1021/acs.jctc.3c01053

K-Means Clustering Coarse-Graining (KMC-CG): A Next Generation Methodology for Determining Optimal Coarse-Grained Mappings of Large Biomolecules

Jiangbo Wu 1, Weizhi Xue 1, Gregory A Voth 1,*
PMCID: PMC10720621  PMID: 37957028

Abstract

graphic file with name ct3c01053_0008.jpg

Coarse-grained (CG) molecular dynamics (MD) has become a method of choice for simulating various large scale biomolecular processes; therefore, the systematic definition of the CG mappings for biomolecules remains an important topic. Appropriate CG mappings can significantly enhance the representability of a CG model and improve its ability to capture critical features of large biomolecules. In this work, we present a systematic and more generalized method called K-means clustering coarse-graining (KMC-CG), which builds on the earlier approach of essential dynamics coarse-graining (ED-CG). KMC-CG removes the sequence-dependent constraints of ED-CG, allowing it to explore a more extensive space and thus enabling the discovery of more physically optimal CG mappings. Furthermore, the implementation of the K-means clustering algorithm can variationally optimize the CG mapping with efficiency and stability. This new method is tested in three cases: ATP-bound G-actin, the HIV-1 CA pentamer, and the Arp2/3 complex. In these examples, the CG models generated by KMC-CG are seen to better capture the structural, dynamic, and functional domains. KMC-CG therefore provides a robust and consistent approach to generating CG models of large biomolecules that can then be more accurately parametrized by either bottom-up or top-down CG force fields.

Introduction

With the ongoing growth in computational power, particularly with the deployment of GPU-powered molecular dynamics (MD) engines, all-atom (AA) MD has become an extensively used tool for studying large biomolecular systems.1,2 Nevertheless, as the time and length scales of the systems of interest continue to increase, many critical biological and even cellular processes, such as the polymerization dynamics of actin filaments3 and the self-assembly of viral capsids,4,5 are beyond the capability of AA MD simulations. Therefore, further development of coarse-grained (CG) MD is necessary to reduce the complexity and to study these systems on coarse scale.6 CG MD simplifies the representation of a molecular system by reducing the number of degrees of freedom, ideally in a systematic fashion. All of the atoms of a large biomolecule are thus represented by a smaller set of CG sites or “beads”, and a CG mapping provides the rules for projecting from an AA structure to a reduced CG model. A well-designed CG mapping can significantly enhance the representability of a CG model and improve its ability to capture key structural features of large biomolecules.79

While chemical intuition can provide relatively accurate CG mapping rules for small organic molecules,1012 it is challenging to extend this approach to highly complex large biomolecules, such as actin, Arp2/3 complex, ribosomes, etc. To overcome these problems, some systematic methods have been proposed to construct optimal CG mappings that can capture the structural and functional domains of large biomolecules.13 These methods can be generally classified into two categories: structure-based methods and dynamics-based methods. Structure-based methods rely on the structural features of large biomolecules to optimize the CG mappings. For example, Arkhipov et al.14 used a Topology-Representing Network (TRN) to study the structural dynamics of viral capsids by matching the atomistic mass distribution of the viral proteins to optimize the CG mapping. Zhang et al.15 proposed a convolutional and K-means coarse-graining (CK-CG) method to recover a biomolecule as a CG model from low-resolution electron density data from cryo-EM. Chakraborty et al.,16 Ho et al.,17 and Webb et al.18 have developed different CG mapping methods based on graph theories, in which a molecule is represented by nodes and edges and the algorithms aim to reduce the large graphs by combining edges and nodes.

On the other hand, based on protein dynamics, Zhang et al.13,19,20 have proposed the essential dynamics coarse-graining (ED-CG) method, which variationally optimizes CG sites by classifying the residues moving collectively into the same CG site. This method has been shown to be valuable for understanding actin and E. coli 70S ribosome.19,21 Li et al.22 have suggested fluctuation maximization coarse-graining (FM-CG) by packing particles in different functional domains of biomolecules into different CG sites. These two methods use a sequence-based linear algorithm that divides the primary sequence into N contiguous subsets by finding the optimal cutting positions. However, this algorithm suffers from an important drawback: the spatial vicinity between residues is not considered. Residues that are adjacent in space but distant in sequence can also have a strong correlation of motion and, consequently, can be packed into the same CG site such as a pair of residues in a β sheet or parallel α helices, which are common structural features in many proteins. Furthermore, atomistic force fields are parametrized assuming that an atom is a charged mass point. If the residues represented by a CG site are linearly distributed, such as those in α helices, then different parameters of nonbonding interaction should be used in different directions, which is difficult to realize in practice. Therefore, the residues represented by a CG site are expected to remain spherically distributed in space as much as possible. Thus, it is important to develop a more generalized algorithm that can capture both the spatial and the dynamic features of large biomolecules.

In this work, by building on the previously developed ED-CG method,13 a more generalized systematic approach called K-means clustering coarse-graining (KMC-CG) is developed to generate optimal CG mappings for highly coarse-grained biomolecules. Our new approach can address the limitations of previous approaches and generate more intuitive CG mappings.

The remainder of this article is structured as follows: we first introduce the KMC-CG method, followed by a description of its numerical implementation and a brief review of the heterogeneous elastic network models (HeteroENM) used for building one form of CG force fields. We also provide a brief overview of the details of the AA MD simulations. Next, the new method is illustrated in three case studies: ATP-bound G-actin (G-ATP), the capsid protein of human immunodeficiency virus type 1 (HIV-1 CA protein), and the Arp2/3 complex in both inactive and active states. Finally, we conclude with some remarks on the significance of our findings.

Theory and Methods

In the ED-CG13 approach, the optimal CG mapping is generated by variationally minimizing the following residual:

graphic file with name ct3c01053_m001.jpg 1

where N represents the number of CG sites to be defined, nt is the number of configurations in the MD trajectory after eliminating the transitional and rotational motion to a reference frame, and ΔrEDi(t) is the displacement from equilibrium for the Cα atom of residue i (denoted as Cα atom i) at time t in the essential dynamics (ED) subspace. If two Cα atoms, i and j, which belong to the same CG site I, move in a highly correlated way, the fluctuation difference in the bracket of eq 1 would be very small, resulting in a decrease in χED2. ED-CG is a sequence-based algorithm, and CG sites are contiguous along the primary sequence of proteins. By minimization of the residual χED2, the primary sequence is divided into N consecutive groups, and the center of mass (COM) of each group is taken as the corresponding CG site.

Based on ED-CG, we aim to extend the sequence-based approach to three-dimensional Euclidean space by removing the constraint on the continuity of the primary sequence and introducing two new terms to the residual. The original variational residual χED2 is then denoted as

graphic file with name ct3c01053_m002.jpg 2

where the superscript “ED” is removed since it is not in the ED subspace now. A new residual χa2 that represents the spatial vicinity, is then given by

graphic file with name ct3c01053_m003.jpg 3

where rI,COM(t) denotes the COM of CG site I at time t, and the position of CG site I is the COM of all Cα atoms of group I. This term constrains the distance between the residues and their corresponding CG centers. A second new residual term χb2 is a penalty term, as defined below

graphic file with name ct3c01053_m004.jpg 4

Here, nnodes,I denotes the number of “discontinuities” in the residue sequence of a CG site. For example, if a CG site contains residues with indexes [1-2-3-(node)-7-(node)-12-13-14], then nnodes,I = 2. This term is mainly used to measure the number of discrete sequences defined in a CG site and can help to merge outlier residues into a neighboring longer sequence. The overall residual χ2 is then combined as follows

graphic file with name ct3c01053_m005.jpg 5

where β and γ are weighting factors and β is typically set to 1 without specific tuning. It should be noted that we can change the contribution from fluctuation and spatial distances and therefore generate customized CG mappings by modifying the values of β and γ.

KMC-CG

After several trials, most optimization methods, including stochastic steepest descent and simulated annealing, failed to efficiently converge to stable solutions, and CG mappings could not be generated consistently. However, one can expand the pairwise summation of χ02 and eliminate the dummy index j as follows:

graphic file with name ct3c01053_m006.jpg 6

where ΔrI,COM(t) = rI,COM(t) – reqI,COM, nI refers to the number of residues of CG site I, and reqI,COM is the equilibrium position of CG site I. The above derivation transforms the pairwise summation form of the fluctuation term χ02 into a distance between Cα atom i and CG site I, which allows us to directly apply the K-means clustering algorithm23 to efficiently optimize the χ02 + βχa2 part of χ2 after defining a new distance term

graphic file with name ct3c01053_m007.jpg 7

where Ji,I is the K-means distance between Cα atom i and CG site I. Once the K-means clustering algorithm gives a stable solution, stochastic minimization can be used to improve the chain connectivity by minimizing the penalty term χb2 and then generating the final CG mapping.

The numerical workflow of KMC-CG is illustrated in Figure 1, which proceeds as follows:

  • (1)

    Given the number of CG sites N, each Cα atom i is randomly assigned a cluster label I ∈ [1, N], indicating Cα atom i belongs to CG site I. The COM of atoms with the same label I is calculated as the initial position of CG site I.

  • (2)

    For each Cα atom i, the K-means distance, Ji,K, between Cα atom i and each CG site K is calculated. Cα atom i is then reassigned a new cluster label si corresponding to the CG site with the least K-means distance Ji,K among all CG sites, which is the “nearest” CG site around Cα atom i.

  • (3)

    After updating the cluster labels in step 2, the COM of atoms with the identical label is calculated to determine the new position of each CG site.

  • (4)

    Steps 2 and 3 are repeated until convergence is achieved.

  • (5)

    Once the preliminary CG mapping is obtained in step 4, stochastic minimization is used to merge outlier residues into neighboring longer sequences. At step t of the minimization, one outlier residue is attempted to be moved to its neighboring chain. The move is accepted only if the resulting χt2 is smaller than the previous value χt–12. This process continues until the final CG mapping is generated.

Figure 1.

Figure 1

Numerical workflow of the KMC-CG method. The algorithm starts by defining the number of CG sites and assigning initial labels to each Cα atom. The K-means clustering algorithm is used to optimize χ2. Stochastic parallel minimization is then implemented to merge the outlier residues. Finally, the optimal CG mapping is obtained once the overall optimization reaches convergence.

Once the minimization of χ2 converges to a stable solution, the position of each CG site is located at the COM of all Cα atoms that belong to the CG site. However, it should be noted that a single KMC-CG minimization run could not guarantee a global minimum for the residual χ2, and it is necessary to run multiple independent replicas with different initial labeling configurations to ensure the accuracy and reliability of the results. Our testing has shown that the clustering algorithm employed in the initial stage can effectively reduce χ2 to a relatively small value within a significantly short time compared with the pairwise summation form. As a supplementary step, the subsequent optimization of chain continuity further ensures the robustness of the results. This issue will also be discussed in the subsequent sections.

HeteroENM

HeteroENM24 can be used to parametrize the force field of the CG model generated by the KMC-CG method, although there are certainly other choices.9 The HeteroENM method is designed to match the fluctuations of a CG model to the fluctuations of the reference AA trajectory by iteratively adjusting force constants k of effective harmonic interactions between pairs of CG sites within a given cutoff distance. The effective CG potential takes the following form:

graphic file with name ct3c01053_m008.jpg 8

where rij is the distance between the two CG sites i and j within the given cutoff distance, and rij,0 is the equilibrium distance of the two CG sites averaged from the AA trajectory. These spring constants are optimized by minimizing the difference between the bond length fluctuations calculated from the CG trajectory and those calculated from the AA trajectory. It has been shown that a well-designed HeteroENM force field can accurately reproduce the fluctuations observed in the AA trajectory as mapped onto the CG sites.24

Molecular Dynamics Simulations

ATP-Bound G-Actin

All MD simulations of G-ATP were performed using the GROMACS version 2018.3.25 The crystal structure of G-ATP (PDB 1NWK(26)) was used for the actin simulation. The missing residues of the DNase-I loop (D-loop) were modeled using the same approach as in previous work.27,28 TIP3P water is added to the system to generate the solvated structure and the water box, using the VMD autosolvate plugin29 such that the structure is at least 11 Å from the box edges. We neutralized the system using the VMD autoionize plugin29 with a KCl concentration of 100 mM. The periodic boundary condition was active across the three dimensions of the simulation box. The CHARMM27 force field with CMAP correction30,31 was used with the particle mesh Ewald sum method32 for long-range electrostatic interactions. After energy minimization with the steepest descent algorithm, the system was pre-equilibrated at 1 atm and 310 K (the constant NVT ensemble followed by the constant NPT ensemble). The production runs were performed for 100 ns with a 2 fs time step in the constant NPT ensemble using Parinello–Rahman pressure coupling33 and v-rescale temperature coupling.34

The HIV-1 CA Pentamer

The construction of the simulation system generally proceeded as described in Yu et al.35 The model was built based on the cryo-electron microscopy (cryo-EM) structure of the HIV-1 CA pentamer (PDB 5MCY(36)), with missing residues added using MODELLER.37 After solvation with TIP3P water38 and neutralization with 150 mM NaCl, the simulation was performed using the AA MD simulation package NAMD 2.13,39 parametrized by the CHARMM36m force field with CMAP correction.40 The system was then energy minimized and equilibrated using a setup similar to that of the G-ATP system, and the production runs were performed in a constant NPT ensemble at 1 atm and 310 K.

The Arp2/3 Complex

To study the Arp2/3 complex in both inactive and active states, the crystal structures of the inactive state (PDB 4JD2(41)) and the active state (PDB 7TPT(42)) were used to build the simulation systems. The inactive structure contains only an Arp2/3 complex, while the whole structure of the active Arp2/3 complex contains mother and daughter filaments. For this study, we constructed a simplified simulation system by selecting a subset of the full junction containing only the Arp2/3 complex. Although the mother filament is required to maintain the structural integrity for the active state, here it was neglected, as the primary purpose was to validate our new method.

The CHARMM-GUI was used to build the simulation boxes and to generate the input scripts for both states.43,44 The system was parametrized by the CHARMM36m force field with CMAP correction.40 All of the simulations of Arp2/3 were performed in GROMACS version 2019.225 with particle mesh Ewald summation32 calculating electrostatic interactions. Construction of the system proceeded in the same fashion as described in the previous work.42,45 A KCl concentration of 50 mM was added to compensate for the net charge of the protein. The system was pre-equilibrated in the constant NVT ensemble for 10 ns using v-rescale34 and then in the constant NPT ensemble for 10 ns using Parrinello–Rahman pressure coupling.33 Two independent 300 ns trajectories were generated for the inactive and active states, respectively, from the MD production runs under constant NPT conditions (310 K and 1 atm).

Coarse-Grained Simulations

All CG MD simulations were performed using LAMMPS version 20 Sep 2021.46 HeteroENM24 was used to parametrize the CG force field for each protein. Following an initial energy minimization, all CG MD simulations were performed in a constant NVT ensemble by integrating the Langevin equation of motion47 with a time step of 10 fs at 310 K. For analysis, the CG trajectories were fitted to the first frame, with translational and rotational motions removed using MDAnalysis.48,49 Visualization of CG models was generated using PyMOL.50

Results and Discussion

CG Models of ATP-Bound G-Actin

G-ATP is a globular protein consisting of 375 residues with a bound ATP located deep inside the cleft.3,51,52 Previous studies have identified multiple functional domains of G-ATP that are strongly associated with various biological processes. For example, the D-loop (residues 40–51) is believed to regulate filament stability in response to nucleotide state,27 the H-loop (263–272) is necessary for the formation of the normal monomer–monomer interfaces in a stable filamentous actin,53 and the flap (236–251) is related to ATP hydrolysis.54 Other important domains include the S-loop (70–79), the SD4-hinge (220–235), etc. Based on the observed structural features in the crystal structure, Saunders and Voth52 have proposed an intuitive 11-site CG model of G-actin, as shown in Figure 2a and Figure 3a, which serves as a reference model for the following comparison. It should be noted that the bound ATP is typically assigned as a single CG site, so the 11-site CG model here implies that only the peptide chain is divided into 11 domains, and the whole protein is actually a 12-site CG model.

Figure 2.

Figure 2

Different 11-site CG mappings of ATP-bound G-actin. (a) The reference mapping from the literature52 is mapped onto the RMSF curve with colors corresponding to different CG sites. The x axis represents the index of Cα atoms, which it also does for panels b and c, and the y axis is the RMSF of each Cα atom. The four subdomains are noted as SD1, SD2, SD3, and SD4, respectively. The biologically functional domains are labeled by arrows. (b) The results of 200 independent replicas of the KMC-CG method. The y axis represents the replica index. Cα atoms with the same color belong to the same CG site. (c) The 11-site mapping obtained from the ED-CG method is mapped onto the RMSF curve, with each color corresponding to a different CG site. The y axis denotes the RMSF of each Cα atom.

Figure 3.

Figure 3

Different 11-site CG models of ATP-bound G-actin. (a) The reference intuitive model from the literature,52 with the D-loop labeled in red. (b) The 11-site model from the KMC-CG method. (c) The 11-site model from the ED-CG method. In all three models, the four large domains and seven small subdomains are represented by four large CG beads and seven small CG beads, respectively. Each CG bead is located at the COM of its corresponding domain.

To evaluate the convergence and stability of the KMC-CG method, a 60 ns trajectory of G-ATP was used as the input, and 200 parallel replicas of KMC-CG were performed. The CG mapping results with 11 colors representing different CG sites are shown in Figure 2b. To avoid any ambiguity, it should be noted that the colors chosen in each panel of Figure 2 only represent different CG beads and are unrelated across different CG mappings. All KMC-CG replicas consistently classified the D-loop, SD4-hinge, flap, and H-loop into their corresponding CG sites, and half of the replicas captured the S-loop. The D-loop, N-terminus, and C-terminus regions exhibit particularly high root-mean-square fluctuations (RMSF), and therefore, each of them deserves a single CG site to represent their characteristic motions. Additionally, another region called the W-loop (165–172), which undergoes a conformational change during the binding of the WH2 domain,53,55 is not classified into a CG bead in the literature52 but is still labeled by KMC-CG. In contrast, ED-CG only captured an extended D-loop (41–66) and grouped the SD4-hinge and the flap region as the same CG site despite their large distance apart in space. These models demonstrate that the KMC-CG method is capable of stably capturing the functional regions of G-ATP due to their compact domain structures and the large RMSF.

Among the 200 replicas, one of the most common mappings generated by the KMC-CG method was selected for comparison (Figure 3b) with the reference intuitive CG model from the literature52 (Figure 3a) and the ED-CG model (Figure 3c). Detailed mapping results can be found in the Supporting Information. The KMC-CG method identified four main subdomains: SD1 (44 residues: 4–13, 16–31, 86–103), SD2 (48 residues: 14–15, 32–40, 49–85), SD3 (70 residues: 136–177, 296–305, 329–346), and SD4 (22 residues: 178–185, 260–273). These subdomains differ from the four subdomains identified in the literature:52 SD1 (113 residues: 5–33, 80–147, 334–349), SD2 (24 residues: 34–39, 52–69), SD3 (93 residues: 148–179, 273–333), and SD4 (51 residues: 180–219, 252–262). The reference intuitive model’s subdomains have more residues than the KMC-CG model, while the KMC-CG model provides a relatively more uniform mapping. The mixed form of the KMC-CG residual suggests that it balances the trade-off between the fluctuation and position residual. Therefore, a CG site cannot contain too many residues; otherwise, the position residual will cause an increase in the overall residual. Additionally, KMC-CG cannot always generate the same mapping for regions with small RMSF since the weak fluctuation of many Cα atoms can make it difficult to discriminate these domains, resulting in slightly different results during the optimization. The weak fluctuations of these domains arise mainly from localized motion that has limited impact on the long time scale of our interest, and the CG mapping for these domains can produce multiple plausible results. This allowed us to perform multiple KMC-CG trials and select the mapping with the smallest residual as the final model. In contrast to ED-CG, the CG sites of the KMC-CG model are always located in regions where residues are densely distributed, while the CG sites of the ED-CG model may reside in regions with a dispersed distribution of residues, such as the two sites colored cyan and orange in Figure 3c. Systematic errors may then arise in the parametrization of intermolecular CG interactions of an ED-CG model. For the orange site, it should have a larger repulsive radius along the α-helix and a smaller repulsive radius perpendicular to the α-helix, but this is challenging to realize in practice with the ED-CG approach. On the other hand, the CG sites of KMC-CG models tend to have spherical symmetry while not overweighting the different species of residues (primary sequence), which can avoid anisotropy to some extent and is an advantage of KMC-CG compared to the ED-CG models with the same CG resolution.

The four-site model of G-ATP is also of interest because it is considered to be the most coarse-grained model that can preserve the fundamental dynamics of actin. Based on the previous work,56,57 a four-site CG model proposed by Chu and Voth27 (Figure 4a) was used as an intuitive reference model to test the performance of the KMC-CG method on the four-site model. The KMC-CG method was only performed once to compare with the best result of ED-CG. As shown in Figure 4d, the four dynamic domains generated from KMC-CG are almost identical to the reference model with errors of only several residues at the boundaries. However, the ED-CG model cannot reproduce all four domains of the reference model. There is even a CG site (colored green in Figure 4c, residues 289–375) oddly located in the region of another CG bead (red, 1–51). Furthermore, due to the sequence-dependent nature of ED-CG, it cannot capture the dynamic domains that are sequentially noncontiguous. In addition to the 4- and 11-site G-ATP, models with other numbers of CG sites also deserve attention, which will be discussed separately in the last section. In summary, the KMC-CG method applied to G-ATP provides a systematic and consistent way to generate CG models that are more aligned with intuition (which exists in this actin case but may not always) and should give more accurate behavior when subsequently parametrized by a bottom-up CG force field.9

Figure 4.

Figure 4

Different four-site CG models of ATP-bound G-actin. (a) The intuitive reference model from the literature:27 SD1 (1–32, 70–144, 338–375), SD2 (33–69), SD3 (145–180, 270–337), and SD4 (181–269). (b) The KMC-CG model: SD1 (1–34, 69–137, 339–375), SD2 (35–68), SD3 (138–183, 260–338), and SD4 (184–259). (c) The ED-CG model: SD1 (1–51), SD2 (52–192), SD3 (193–288), and SD4 (289–375). (d) The three models are mapped onto the RMSF curves with colors corresponding to the four subdomains in panels a, b, and c. The x axis represents the index of the Cα atoms, and the y axis is the RMSF of each Cα atom. Each CG bead is located at the COM of its corresponding domain in each model.

CG Models of the HIV-1 CA Pentamer

The study of protein complexes consisting of quasi-linear protein components using the KMC-CG method is an area of interest after its successful application to a globular protein. The HIV-1 capsid (CA) protein pentamer is a protein complex composed of five monomers of the HIV-1 CA protein, which is known to stabilize the high curvature sites during the maturation of the viral capsid.4,36,58 Although the five monomers are structurally quasi-equivalent, the real dynamics from the 200 ns trajectory shows slight differences due to the limited sampling of the simulation. Therefore, four-site CG models of different monomers were constructed using the AA trajectories of each chain at equilibrium. It was found that there were different optimal KMC-CG mappings for different monomers. The mapping generated from the second monomer, which best reproduced the RMSF from the AA mapped CG trajectory, was selected to construct the twenty-site KMC-CG model of the whole pentamer using the same four-site mapping (Figure 5a). Other RMSF matching results are provided in the Supporting Information.

Figure 5.

Figure 5

Two different twenty-site CG models of the HIV-1 CA pentamer. (a) The KMC-CG model (side view): SD1 (1–73, 127–146), SD2 (74–126), SD3 (147–225), SD4 (226–231). One monomer of the CA pentamer is colored green, orange, blue, and red, and the other four monomers are colored gray. (b) The ED-CG model (side view): SD1 (1–77), SD2 (78–126), SD3 (127–199), and SD4 (200–231). (c) The two models are mapped onto the RMSF curves with four colors corresponding to the different subdomains in panels a and b. The x axis is the index of Cα atoms, and the y axis represents the RMSF of each Cα atom. (d) The RMSF curves obtained from CG simulations of the KMC-CG model (red) and from the mapped all-atom trajectories (blue for the KMC-CG model and gray for the ED-CG model). The x axis represents the index of the CG sites with every four consecutive CG sites representing a monomer, and the y axis is the RMSF of each CG site. In both models, the CG beads are located at the COM of their corresponding domains. The RMSF curve of the mapped AA trajectory of the ED-CG model is included.

For the KMC-CG model, the boundary atoms between SD1 (Figure 5a, orange) and SD2 (Figure 5a, green) are located at the α-helix hinge region and the boundary between an α-helix and a loop, respectively, which exhibits fewer RMSF, while the boundary Cα atom between SD3 and SD4 has a relatively higher RMSF. This appears reasonable since the six Cα atoms of SD4 have very high fluctuations and move in a highly correlated fashion, which deserves a single separate CG site to describe their characteristic motion. This model is in reasonable agreement with intuition, as all of the residues of each CG site are densely and spherically distributed and move collectively. For comparison, another four-site CG model built by the ED-CG method is shown in Figure 5b. The ED-CG model assigned SD3 a protruding α-helix on the left. However, it is more appropriate to assign that α-helix to SD1 since they are spatially close and more related in dynamics (χ02 of the KMC-CG model is smaller than the χ02 of the ED-CG model).

To evaluate the dynamical performance of the KMC-CG model, HeteroENM was further used to parametrize the CG force field for the twenty-site CA pentamer, and five independent 200 ns long CG trajectories were generated to calculate the average RMSF for each CG site. In Figure 5d, the RMSF is compared with the reference trajectory mapped from an all-atom trajectory (called AA mapped CG trajectory) using the selected CG mapping. The error bars of the CG RMSF curve, which are very small and not easily observed on the graph, can be found in the Supporting Information. The fluctuations demonstrate that the twenty-site KMC-CG model can capture most of the necessary structural features, including the high fluctuation of SD4 (the five peaks in Figure 5d). Despite the differences between the KMC-CG models of different monomers, the RMSF curve of the new CG trajectories and the AA mapped CG trajectory align very well. Also, the RMSF curve of each CA monomer has the same pattern, indicating that their modes of motion in equilibrium appear to be similar enough to be described by the same CG mapping. Compared with the ED-CG model, the KMC-CG model captures and emphasizes the highly fluctuating domains (Figure 5d), especially the remarkably flexible C-terminal domain of each monomer, which can effectively preserve the structural and dynamics information. The C-terminal domain of the ED-CG model is coupled to an α-helix, which can stabilize the motion of the domain and result in some loss of information regarding the high flexibility. These results suggest that the KMC-CG method is also capable of building a good representative CG model of a protein complex.

CG Models of the Arp2/3 Complex

The last important biomolecular system investigated by the KMC-CG method is Arp2/3, which is a larger protein complex in three-dimensional space compared to the previous examples. The Arp2/3 complex is pivotal in the polymerization of the actin filament,5962 which includes about two thousand amino acids. The complex has seven subunits, among which two subunits, namely, Arp2 and Arp3, bind with the mother and daughter actin filaments, while the remaining five subunits, ARPC1–C5, serve as supporting subunits that facilitate the binding of Arp2/3 with the mother filament.59 The Arp2/3 complex is known to undergo an active/inactive state transition, with the inactive state existing freely without the activation of nucleation promoting factors (NPFs),62 while the active Arp2/3 complex is attached to the mother filament followed by the growth of the daughter filament (Figure 6a).

Figure 6.

Figure 6

(a) The cryo-EM structure of the branch junction from Ding et al.42 The gray sections represent the mother filament (left) and daughter filament (right) of actin. The Arp2/3 complex is at the branch junction. Panels b and c are the KMC-CG models of the inactive and active Arp2/3 complexes, respectively. Different colors correspond to the seven subunits, and spheres of the same color represent the CG beads of that subunit. The ARPC1 insert region is colored by opaque green. Panels d and e are the RMSF curves obtained from CG simulations of the KMC-CG model (blue) and the mapped all-atom trajectory (red) for the inactive and active states, respectively. The x axis represents the index of the CG sites, and the y axis represents the RMSF of each CG site. The seven subunits are Arp3 (CG sites 1–11 and 35 for the nucleotide), Arp2 (12–22 and 36 for the nucleotide), ARPC1 (23–26), ARPC2 (27–28), ARPC3 (29–30), ARPC4 (31–32), and ARPC5 (33–34). In each model, each CG site is the COM of its corresponding domain.

The complex undergoes a large-scale conformational change during the activation. With a resolution of about 50 residues per CG site, a 36-site model of the Arp2/3 complex was constructed using the KMC-CG method based on the AA trajectory of the inactive state. Arp2 and Arp3 each have 12 CG sites, of which the 12th site is a nucleotide. Four CG sites were assigned to the ARPC1 subunit, and two CG sites were assigned to each of the remaining ARPC2–C5 subunits. The CG mapping obtained from the inactive state was then applied to the active state without any modification to build the CG models for both states, as depicted in Figure 6b,c. The ARPC1 subunit has an ARPC1 insert region (residues 298–307), which is an α-helix protruding out to bind with the mother filament in the process of actin filament branching.42 The insert region was automatically identified as a single CG site (residues 292–310) by the KMC-CG method (Figure 6b, colored opaque green).

HeteroENM was then used to fit the equilibrium pairwise effective harmonic interactions for both states. Five independent 200 ns long CG trajectories of each state were generated for the calculation of the average RMSF for each CG site. The error statistics of the RMSF curves can be found in Supporting Information. As Figure 6d,e show, there is significant alignment between the RMSF curves of the CG trajectories and the AA mapped CG trajectories, except for the CG sites 23–24, 26–27 and 31–33 of the active state, which have a slight deviation from the reference curve. However, this deviation is acceptable, since there is conformational change during the activation of the Arp2/3 complex. Another notable aspect is the apparent fluctuation difference of ARPC1 (CG sites 23–26) in both states. As the mother filament was removed from the active structure, the ARPC1 insert region of the active state appears to have greater degrees of freedom in motion, which is characterized by the two RMSF peaks of the CG sites 23–24. Further structural features discovered by Ding et al.,42 such as the flattening of the complex dihedrals, can also be identified by the KMC-CG method, which is provided in the Supporting Information.

Further Comparison with ED-CG

Due to the linear constraints, ED-CG is unable to identify residues that are discontinuous in sequence but move collectively. Consequently, the fluctuation residual χ02 (eq 2) obtained by ED-CG must be further minimized. We use G-ATP with different numbers of CG sites N to compare the χ02 values of ED-CG and KMC-CG. As shown in Figure 7, the residuals obtained by KMC-CG are significantly smaller than those from ED-CG when N is less than 19. This indicates that the sequence-based ED-CG indeed limits the global search for the best solution to dividing dynamic domains, whereas KMC-CG removes the linear constraints, allowing the search space to be expanded to the entire space. The position residual χa2 constrains the distance between a residue and the corresponding CG bead, preventing residues that are spatially distant but coincidentally moving in a correlated fashion from being classified into the same CG site, which is unreasonable. However, when N ≥ 19 (the CG resolution is greater than about 20 residues/CG site), the χ02 of ED-CG starts to be slightly smaller than that of KMC-CG because a larger number of CG sites makes the search space too large for KMC-CG to significantly reduce the residual further. Therefore, KMC-CG provides more stable results for highly coarse-grained systems but may not for higher-resolution systems. Also, β = 1 can work for most systems and generate consistent mappings when resolution is smaller than about 20 residues/CG site. For more fine-grained systems, the weighting factor of the position residual β should be reduced to improve the relative weight of the fluctuation term. For example, when β is changed to 0.5, one can obtain χ02 = 105.1 for ED-CG and χ02 = 103.0 for KMC-CG when N = 19.

Figure 7.

Figure 7

Comparison of the fluctuation residuals χ02 (eq 2) obtained by KMC-CG and ED-CG. The x axis represents the number of CG sites used for the CG model of G-ATP, and the y axis shows the corresponding fluctuation residuals. The results indicate that when N ≤ 18, the χ02 of KMC-CG is significantly smaller than that of ED-CG. When N ≥ 19, the χ02 of ED-CG becomes slightly smaller than that of KMC-CG.

Compared with ED-CG, KMC-CG also has an additional parameter γ which is used to modify the weight of the continuity term χb2. After testing γ with values between 1 and 500, the impact of different γ was found to be negligible for the final CG mapping result, since the initial K-means clustering optimization gives a preliminary but sufficiently small residual by minimizing χ02 + βχa2. This implies that the clustering results are not sensitive to perturbations in the parameter γ. Therefore, the subsequent stochastic parallel minimization can work very well by using the default γ of 1.

Conclusions

In this work, a systematic and generalized new method, KMC-CG, has been presented for defining optimal CG mappings of large biomolecules. This new method builds upon the previous ED-CG approach,13,20 which is limited by the constraints on sequence continuity, while KMC-CG removes these constraints and introduces two new residuals, a spatial vicinity term βχa2 and a chain continuity term γχb2, to the original ED-CG residual χ02. This allows KMC-CG to explore a more extensive CG mapping solution space, extending from quasi-one-dimensional to three-dimensional protein structures and complexes and therefore enabling the discovery of more physically reasonable and valuable CG mappings. The implementation of the K-means clustering algorithm23 can variationally optimize the mixed residual χ2 with efficiency and stability. With a specified number of CG sites, N, KMC-CG can split the primary sequence of a large biomolecule into N dynamic domains, which can capture both dynamic and structural features.

We applied the KMC-CG method to large biomolecular systems at three different levels: single protein (the ATP-bound G-actin), quasi-linear protein structures in a complex (the HIV-1 CA pentamer), and a large 3D protein complex (the Arp2/3 complex). The resulting CG models were demonstrated to be effective and robust and can correctly identify and group the biologically functional domains discovered in previous studies3,26,42,52,53,57 into the corresponding CG sites. The four-site G-ATP model perfectly reproduces the literature intuitive model,27 which can reflect the opening/closing motion of the actin cleft. The 11-site G-ATP CG model is able to identify most of the functional dynamic domains, such as the D-loop, H-loop, W-loop, etc. The KMC-CG four-site model of the HIV-1 CA monomer can mostly reflect the RMSF of the whole pentamer. The KMC-CG model of the Arp2/3 complex captures the ARPC1 insert region in its corresponding CG site and may be capable of discovering collective variables that can characterize dynamic features of the active/inactive state transition. Fluctuation matching results of these systems show generally excellent agreement between the AA mapped CG reference trajectory and the CG trajectories.

Highly coarse-grained systems have proven very useful for modeling complex biomolecular processes, such as the polymerization and conversion of actin filaments63,64 and the HIV-1 viral capsid assembly.4 Key biological processes involve the participation of a great number of large biomolecules and occur on long length and time scales. A simple but physically accurate CG model can, therefore, greatly enhance the efficiency of computer simulations of such processes while preserving the critical physical behavior of the constituent biomolecules. Compared with ED-CG, KMC-CG works significantly better and can generate physically accurate CG models for highly coarse-grained systems with a resolution lower than about 20 residues/CG site. Since ED-CG aims to find a group of sequentially contiguous residues that are moving collectively, some important dynamic domains that are spatially contiguous may be lost, while KMC-CG considers both spatial contiguity and the synchronicity of motion, avoiding the loss of such critical information in a CG model. Moreover, the domains defined by KMC-CG tend to be spherically distributed, and the boundary atoms are typically located at the hinge regions, allowing for a more uniform domain surface and distinct boundaries between dynamic domains, respectively. Therefore, the KMC-CG algorithm is an improvement on the current sequential ED-CG CG mapping scheme and provides reasonable CG mappings for biomolecular systems that are able to reproduce the relevant structural and (at least to the lowest order) dynamical features.

It is also valuable to briefly discuss the number of CG sites required for representing large biomolecules when using the KMC-CG method. To a significant extent, this depends on the scale of the problem that one is studying. For example, a four-site model of G-ATP was sufficient to represent only the ATP cleft, but more CG sites are needed for the study of the D-loop. Future work therefore will involve ways in which to determine both the minimum and the optimal number of CG sites to preserve the functional domains, which may be achieved through the use of information entropy.65 On the other hand, it is important to construct better CG models for large biomolecules in general, such as HIV-1 protease and Gag,66 so the new KMC-CG methodology will also be applied to additional systems to support the study of their complex biomolecular processes.

Acknowledgments

This material is based upon work supported in part by the National Institute of General Medical Sciences (NIGMS), National Institutes of Health (NIH grant R01GM063796), and in part by the National Science Foundation (NSF grant CHE-2102677). The computational resources for this research were provided by the University of Chicago Research Computing Center (RCC) Midway supercomputer and the NIH-funded Beagle3 HPC cluster (Award Number S10OD028655).

Data Availability Statement

The software is a part of the OpenMSCG package67 (https://software.rcc.uchicago.edu/mscg/).

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.3c01053.

  • Additional error statistics and analysis of CG mappings (PDF)

Author Contributions

# J.W. and W.X. contributed equally to this work. J.W. conceived the idea. J.W., W.X., and G.A.V. designed the research. W.X. wrote the source code. J.W. and W.X. performed the simulations. All authors analyzed the results and wrote the manuscript.

The authors declare no competing financial interest.

Supplementary Material

ct3c01053_si_001.pdf (7.5MB, pdf)

References

  1. Adcock S. A.; McCammon J. A. Molecular Dynamics: Survey of Methods for Simulating the Activity of Proteins. Chem. Rev. 2006, 106, 1589–1615. 10.1021/cr040426m. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Karplus M.; McCammon J. A. Molecular Dynamics Simulations of Biomolecules. Nat. Struct. Biol. 2002, 9, 646–652. 10.1038/nsb0902-646. [DOI] [PubMed] [Google Scholar]
  3. Pollard T. D.; Cooper J. A. Actin, a Central Player in Cell Shape and Movement. Science 2009, 326, 1208–1212. 10.1126/science.1175862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Gupta M.; Pak A. J.; Voth G. A. Critical Mechanistic Features of HIV-1 Viral Capsid Assembly. Sci. Adv. 2023, 9, eadd7434 10.1126/sciadv.add7434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Ganser-Pornillos B. K.; Yeager M.; Sundquist W. I. The Structural Biology of HIV Assembly. Curr. Opin. Struct. Biol. 2008, 18, 203–217. 10.1016/j.sbi.2008.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Tozzini V. Coarse-Grained Models for Proteins. Curr. Opin. Struct. Biol. 2005, 15, 144–150. 10.1016/j.sbi.2005.02.005. [DOI] [PubMed] [Google Scholar]
  7. Saunders M. G.; Voth G. A. Coarse-Graining Methods for Computational Biology. Annu. Rev. Biophys. 2013, 42, 73–93. 10.1146/annurev-biophys-083012-130348. [DOI] [PubMed] [Google Scholar]
  8. Noid W. G. Perspective: Coarse-Grained Models for Biomolecular Systems. J. Chem. Phys. 2013, 139, 090901 10.1063/1.4818908. [DOI] [PubMed] [Google Scholar]
  9. Jin J.; Pak A. J.; Durumeric A. E. P.; Loose T. D.; Voth G. A. Bottom-up Coarse-Graining: Principles and Perspectives. J. Chem. Theory Comput. 2022, 18, 5759–5791. 10.1021/acs.jctc.2c00643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Izvekov S.; Voth G. A. A Multiscale Coarse-Graining Method for Biomolecular Systems. J. Phys. Chem. B 2005, 109, 2469–2473. 10.1021/jp044629q. [DOI] [PubMed] [Google Scholar]
  11. Zhou J.; Thorpe I. F.; Izvekov S.; Voth G. A. Coarse-Grained Peptide Modeling Using a Systematic Multiscale Approach. Biophys. J. 2007, 92, 4289–4303. 10.1529/biophysj.106.094425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Noid W. G.; Chu J. W.; Ayton G. S.; Krishna V.; Izvekov S.; Voth G. A.; Das A.; Andersen H. C. The Multiscale Coarse-Graining Method. I. A Rigorous Bridge Between Atomistic and Coarse-Grained Models. J. Chem. Phys. 2008, 128, 244114 10.1063/1.2938860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Zhang Z.; Lu L.; Noid W. G.; Krishna V.; Pfaendtner J.; Voth G. A. A Systematic Methodology for Defining Coarse-Grained Sites in Large Biomolecules. Biophys. J. 2008, 95, 5073–5083. 10.1529/biophysj.108.139626. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Arkhipov A.; Freddolino P. L.; Schulten K. Stability and Dynamics of Virus Capsids Described by Coarse-Grained Modeling. Structure 2006, 14, 1767–1777. 10.1016/j.str.2006.10.003. [DOI] [PubMed] [Google Scholar]
  15. Wu Z.; Zhang Y.; Zhang J. Z.; Xia K.; Xia F. Determining Optimal Coarse-Grained Representation for Biomolecules Using Internal Cluster Validation Indexes. J. Comput. Chem. 2020, 41, 14–20. 10.1002/jcc.26070. [DOI] [PubMed] [Google Scholar]
  16. Chakraborty M.; Xu C.; White A. D. Encoding and Selecting Coarse-Grain Mapping Operators with Hierarchical Graphs. J. Chem. Phys. 2018, 149, 134106 10.1063/1.5040114. [DOI] [PubMed] [Google Scholar]
  17. Ho K. C.; Hamelberg D. Combinatorial Coarse-Graining of Molecular Dynamics Simulations for Detecting Relationships between Local Configurations and Overall Conformations. J. Chem. Theory Comput. 2018, 14, 6026–6034. 10.1021/acs.jctc.8b00333. [DOI] [PubMed] [Google Scholar]
  18. Webb M. A.; Delannoy J. Y.; de Pablo J. J. Graph-Based Approach to Systematic Molecular Coarse-Graining. J. Chem. Theory Comput. 2019, 15, 1199–1208. 10.1021/acs.jctc.8b00920. [DOI] [PubMed] [Google Scholar]
  19. Zhang Z.; Voth G. A. Coarse-Grained Representations of Large Biomolecular Complexes from Low-Resolution Structural Data. J. Chem. Theory Comput. 2010, 6, 2990–3002. 10.1021/ct100374a. [DOI] [PubMed] [Google Scholar]
  20. Zhang Z.; Pfaendtner J.; Grafmüller A.; Voth G. A. Defining Coarse-Grained Representations of Large Biomolecules and Biomolecular Complexes from Elastic Network Models. Biophys. J. 2009, 97, 2327–2337. 10.1016/j.bpj.2009.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Zhang Z.; Sanbonmatsu K. Y.; Voth G. A. Key Intermolecular Interactions in the E. coli 70S Ribosome Revealed by Coarse-Grained Analysis. J. Am. Chem. Soc. 2011, 133, 16828–16838. 10.1021/ja2028487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Li M.; Zhang J. Z.; Xia F. Constructing Optimal Coarse-Grained Sites of Huge Biomolecules by Fluctuation Maximization. J. Chem. Theory Comput. 2016, 12, 2091–2100. 10.1021/acs.jctc.6b00016. [DOI] [PubMed] [Google Scholar]
  23. Hartigan J. A.; Wong M. A. Algorithm AS 136: A K-Means Clustering Algorithm. J. R. Stat. Soc. Ser. C. Appl. Stat. 1979, 28, 100–108. 10.2307/2346830. [DOI] [Google Scholar]
  24. Lyman E.; Pfaendtner J.; Voth G. A. Systematic Multiscale Parameterization of Heterogeneous Elastic Network Models of Proteins. Biophys. J. 2008, 95, 4183–4192. 10.1529/biophysj.108.139733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Van Der Spoel D.; Lindahl E.; Hess B.; Groenhof G.; Mark A. E.; Berendsen H. J. GROMACS: Fast, Flexible, and Free. J. Comput. Chem. 2005, 26, 1701–1718. 10.1002/jcc.20291. [DOI] [PubMed] [Google Scholar]
  26. Graceffa P.; Dominguez R. Crystal Structure of Monomeric Actin in the ATP State: Structural Basis of Nucleotide-Dependent Actin Dynamics. J. Biol. Chem. 2003, 278, 34172–34180. 10.1074/jbc.M303689200. [DOI] [PubMed] [Google Scholar]
  27. Chu J.-W.; Voth G. A. Allostery of Actin Filaments: Molecular Dynamics Simulations and Coarse-Grained Analysis. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 13111–13116. 10.1073/pnas.0503732102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Zsolnay V.; Katkar H. H.; Chou S. Z.; Pollard T. D.; Voth G. A. Structural Basis for Polarized Elongation of Actin Filaments. Proc. Natl. Acad. Sci. U. S. A. 2020, 117, 30458–30464. 10.1073/pnas.2011128117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Humphrey W.; Dalke A.; Schulten K. VMD: Visual Molecular Dynamics. J. Mol. Graph. 1996, 14, 33–38. 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
  30. Mackerell A. D.; Bashford D.; Bellott M.; Dunbrack R. L.; Evanseck J. D.; Field M. J.; Fischer S.; Gao J.; Guo H.; Ha S.; Joseph-Mccarthy D.; Kuchnir L.; Kuczera K.; Lau F. T. K.; Mattos C.; Michnick S.; Ngo T.; Nguyen D. T.; Prodhom B.; Reiher W. E.; Roux B.; Schlenkrich M.; Smith J. C.; Stote R.; Straub J.; Watanabe M.; Wiórkiewicz-Kuczera J.; Yin D.; Karplus M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586–3616. 10.1021/jp973084f. [DOI] [PubMed] [Google Scholar]
  31. Brooks B. R.; Brooks C. L. 3rd; Mackerell A. D. Jr.; Nilsson L.; Petrella R. J.; Roux B.; Won Y.; Archontis G.; Bartels C.; Boresch S.; Caflisch A.; Caves L.; Cui Q.; Dinner A. R.; Feig M.; Fischer S.; Gao J.; Hodoscek M.; Im W.; Kuczera K.; Lazaridis T.; Ma J.; Ovchinnikov V.; Paci E.; Pastor R. W.; Post C. B.; Pu J. Z.; Schaefer M.; Tidor B.; Venable R. M.; Woodcock H. L.; Wu X.; Yang W.; York D. M.; Karplus M. CHARMM: The Biomolecular Simulation Program. J. Comput. Chem. 2009, 30, 1545–1614. 10.1002/jcc.21287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Darden T.; York D.; Pedersen L. Particle Mesh Ewald: An N·Log(N) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98, 10089–10092. 10.1063/1.464397. [DOI] [Google Scholar]
  33. Parrinello M.; Rahman A. Polymorphic Transitions in Single Crystals: A New Molecular Dynamics Method. J. Appl. Phys. 1981, 52, 7182–7190. 10.1063/1.328693. [DOI] [Google Scholar]
  34. Bussi G.; Donadio D.; Parrinello M. Canonical Sampling Through Velocity Rescaling. J. Chem. Phys. 2007, 126, 014101 10.1063/1.2408420. [DOI] [PubMed] [Google Scholar]
  35. Yu A.; Lee E. M. Y.; Jin J.; Voth G. A. Atomic-Scale Characterization of Mature HIV-1 Capsid Stabilization by Inositol Hexakisphosphate (IP6). Sci. Adv. 2020, 6, eabc6465 10.1126/sciadv.abc6465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Mattei S.; Glass B.; Hagen W. J. H.; Kräusslich H.-G.; Briggs J. A. G. The Structure and Flexibility of Conical HIV-1 Capsids Determined within Intact Virions. Science 2016, 354, 1434–1437. 10.1126/science.aah4972. [DOI] [PubMed] [Google Scholar]
  37. Eswar N.; Webb B.; Marti-Renom M. A.; Madhusudhan M. S.; Eramian D.; Shen M. Y.; Pieper U.; Sali A. Comparative Protein Structure Modeling Using Modeller. Curr. Protoc. Bioinformatics 2006, 15, 5.6.1–5.6.30. 10.1002/0471250953.bi0506s15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Jorgensen W. L.; Chandrasekhar J.; Madura J. D.; Impey R. W.; Klein M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926–935. 10.1063/1.445869. [DOI] [Google Scholar]
  39. Phillips J. C.; Braun R.; Wang W.; Gumbart J.; Tajkhorshid E.; Villa E.; Chipot C.; Skeel R. D.; Kalé L.; Schulten K. Scalable Molecular Dynamics with NAMD. J. Comput. Chem. 2005, 26, 1781–1802. 10.1002/jcc.20289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Huang J.; Rauscher S.; Nawrocki G.; Ran T.; Feig M.; de Groot B. L.; Grubmuller H.; MacKerell A. D. Jr. CHARMM36m: An Improved Force Field for Folded and Intrinsically Disordered Proteins. Nat. Methods 2017, 14, 71–73. 10.1038/nmeth.4067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Luan Q.; Nolen B. J. Structural Basis for Regulation of Arp2/3 Complex by GMF. Nat. Struct. Mol. Biol. 2013, 20, 1062–1068. 10.1038/nsmb.2628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Ding B.; Narvaez-Ortiz H. Y.; Singh Y.; Hocky G. M.; Chowdhury S.; Nolen B. J. Structure of Arp2/3 Complex at a Branched Actin Filament Junction Resolved by Single-Particle Cryo-Electron Microscopy. Proc. Natl. Acad. Sci. U. S. A. 2022, 119, e2202723119 10.1073/pnas.2202723119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Lee J.; Cheng X.; Swails J. M.; Yeom M. S.; Eastman P. K.; Lemkul J. A.; Wei S.; Buckner J.; Jeong J. C.; Qi Y.; Jo S.; Pande V. S.; Case D. A.; Brooks C. L.; Mackerell A. D.; Klauda J. B.; Im W. CHARMM-GUI Input Generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM Simulations Using the CHARMM36 Additive Force Field. J. Chem. Theory Comput. 2016, 12, 405–413. 10.1021/acs.jctc.5b00935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Jo S.; Kim T.; Iyer V. G.; Im W. CHARMM-GUI: A Web-Based Graphical User Interface for Charmm. J. Comput. Chem. 2008, 29, 1859–1865. 10.1002/jcc.20945. [DOI] [PubMed] [Google Scholar]
  45. Hocky G. M.; Baker J. L.; Bradley M. J.; Sinitskiy A. V.; De La Cruz E. M.; Voth G. A. Cations Stiffen Actin Filaments by Adhering a Key Structural Element to Adjacent Subunits. J. Phys. Chem. B 2016, 120, 4558–4567. 10.1021/acs.jpcb.6b02741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Plimpton S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comput. Phys. 1995, 117, 1–19. 10.1006/jcph.1995.1039. [DOI] [Google Scholar]
  47. Vanden-Eijnden E.; Ciccotti G. Second-Order Integrators for Langevin Equations With Holonomic Constraints. Chem. Phys. Lett. 2006, 429, 310–316. 10.1016/j.cplett.2006.07.086. [DOI] [Google Scholar]
  48. Gowers R.; Linke M.; Barnoud J.; Reddy T.; Melo M.; Seyler S.; Domański J.; Dotson D.; Buchoux S.; Kenney I.; Beckstein O.. MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations. In Proceedings of the 15th Python in Science Conference, United States, 2016; pp 98–105.
  49. Michaud-Agrawal N.; Denning E. J.; Woolf T. B.; Beckstein O. MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. J. Comput. Chem. 2011, 32, 2319–2327. 10.1002/jcc.21787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. The PyMOL Molecular Graphics System, Version 1.8; Schrodinger, LLC: 2015.
  51. Dominguez R.; Holmes K. C. Actin Structure and Function. Annu. Rev. Biophys. 2011, 40, 169–186. 10.1146/annurev-biophys-042910-155359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Saunders M. G.; Voth G. A. Comparison between Actin Filament Models: Coarse-Graining Reveals Rssential Differences. Structure 2012, 20, 641–653. 10.1016/j.str.2012.02.008. [DOI] [PubMed] [Google Scholar]
  53. Shvetsov A.; Galkin V. E.; Orlova A.; Phillips M.; Bergeron S. E.; Rubenstein P. A.; Egelman E. H.; Reisler E. Actin Hydrophobic Loop 262–274 and Filament Nucleation and Elongation. J. Mol. Biol. 2008, 375, 793–801. 10.1016/j.jmb.2007.10.076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. McCullagh M.; Saunders M. G.; Voth G. A. Unraveling the Mystery of ATP Hydrolysis in Actin Filaments. J. Am. Chem. Soc. 2014, 136, 13053–13058. 10.1021/ja507169f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Zheng X.; Diraviyam K.; Sept D. Nucleotide Effects on the Structure and Dynamics of Actin. Biophys. J. 2007, 93, 1277–1283. 10.1529/biophysj.107.109215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Holmes K. C.; Popp D.; Gebhard W.; Kabsch W. Atomic Model of the Actin Filament. Nature 1990, 347, 44–49. 10.1038/347044a0. [DOI] [PubMed] [Google Scholar]
  57. Kabsch W.; Mannherz H. G.; Suck D.; Pai E. F.; Holmes K. C. Atomic Structure of the Actin: DNase I Complex. Nature 1990, 347, 37–44. 10.1038/347037a0. [DOI] [PubMed] [Google Scholar]
  58. Grime J. M. A.; Dama J. F.; Ganser-Pornillos B. K.; Woodward C. L.; Jensen G. J.; Yeager M.; Voth G. A. Coarse-Grained Simulation Reveals Key Features of HIV-1 Capsid Self-Assembly. Nat. Commun. 2016, 7, 11568. 10.1038/ncomms11568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Pollard T. D. Regulation of Actin Filament Assembly by Arp2/3 Complex and Formins. Annu. Rev. Biophys. Biomol. Struct. 2007, 36, 451–477. 10.1146/annurev.biophys.35.040405.101936. [DOI] [PubMed] [Google Scholar]
  60. Chesarone M. A.; Goode B. L. Actin Nucleation and Elongation Factors: Mechanisms and Interplay. Curr. Opin. Cell. Biol. 2009, 21, 28–37. 10.1016/j.ceb.2008.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Skau C. T.; Waterman C. M. Specification of Architecture and Function of Actin Structures by Actin Nucleation Factors. Annu. Rev. Biophys. 2015, 44, 285–310. 10.1146/annurev-biophys-060414-034308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Goley E. D.; Rodenbusch S. E.; Martin A. C.; Welch M. D. Critical Conformational Changes in the Arp2/3 Complex Are Induced by Nucleotide and Nucleation Promoting Factor. Mol. Cell 2004, 16, 269–279. 10.1016/j.molcel.2004.09.018. [DOI] [PubMed] [Google Scholar]
  63. Mani S.; Katkar H. H.; Voth G. A. Compressive and Tensile Deformations Alter ATP Hydrolysis and Phosphate Release Rates in Actin Filaments. J. Chem. Theory Comput. 2021, 17, 1900–1913. 10.1021/acs.jctc.0c01186. [DOI] [PubMed] [Google Scholar]
  64. Katkar H. H.; Davtyan A.; Durumeric A. E. P.; Hocky G. M.; Schramm A. C.; De La Cruz E. M.; Voth G. A. Insights into the Cooperative Nature of ATP Hydrolysis in Actin Filaments. Biophys. J. 2018, 115, 1589–1602. 10.1016/j.bpj.2018.08.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Foley T. T.; Shell M. S.; Noid W. G. The Impact of Resolution Upon Entropy and Information in Coarse-Grained Models. J. Chem. Phys. 2015, 143, 243104 10.1063/1.4929836. [DOI] [PubMed] [Google Scholar]
  66. Tan A.; Pak A. J.; Morado D. R.; Voth G. A.; Briggs J. A. G. Immature HIV-1 Assembles From Gag Dimers Leaving Partial Hexamers at Lattice Edges as Potential Substrates for Proteolytic Maturation. Proc. Natl. Acad. Sci. U. S. A. 2021, 118, e2020054118 10.1073/pnas.2020054118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Peng Y.; Pak A. J.; Durumeric A. E. P.; Sahrmann P. G.; Mani S.; Jin J.; Loose T. D.; Beiter J.; Voth G. A. OpenMSCG: A Software Tool for Bottom-Up Coarse-Graining. J. Phys. Chem. B 2023, 127, 8537–8550. 10.1021/acs.jpcb.3c04473. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ct3c01053_si_001.pdf (7.5MB, pdf)

Data Availability Statement

The software is a part of the OpenMSCG package67 (https://software.rcc.uchicago.edu/mscg/).


Articles from Journal of Chemical Theory and Computation are provided here courtesy of American Chemical Society

RESOURCES