Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2024 Nov 1:2024.10.29.620916. [Version 1] doi: 10.1101/2024.10.29.620916

COCOMO2: A coarse-grained model for interacting folded and disordered proteins

Alexander Jussupow 1, Divya Bartley 1, Lisa J Lapidus 2, Michael Feig 1,*
PMCID: PMC11565878  PMID: 39554101

Abstract

Biomolecular interactions are essential in many biological processes, including complex formation and phase separation processes. Coarse-grained computational models are especially valuable for studying such processes via simulation. Here, we present COCOMO2, an updated residue-based coarse-grained model that extends its applicability from intrinsically disordered peptides to folded proteins. This is accomplished with the introduction of a surface exposure scaling factor, which adjusts interaction strengths based on solvent accessibility, to enable the more realistic modeling of interactions involving folded domains without additional computational costs. COCOMO2 was parameterized directly with solubility and phase separation data to improve its performance on predicting concentration-dependent phase separation for a broader range of biomolecular systems compared to the original version. COCOMO2 enables new applications including the study of condensates that involve IDPs together with folded domains and the study of complex assembly processes. COCOMO2 also provides an expanded foundation for the development of multi-scale approaches for modeling biomolecular interactions that span from residue-level to atomistic resolution.

Graphical Abstract

graphic file with name nihpp-2024.10.29.620916v1-f0001.jpg

INTRODUCTION

Intermolecular interactions between biomolecules play a central role for many aspects of biology. Specific interactions lead to oligomerization1, aggregation2, and the formation of large assemblies such as the ribosome3, the mediator complex4, the nuclear pore complex5, virus capsids6, or bacterial microcompartments7. In the crowded cellular interior, non-specific interactions are inevitable, leading to clustering8 or phase separation9. Biomolecular phase separation, particularly liquid-liquid phase separation (LLPS), plays a crucial role in the formation of membrane-less organelles within cells, driving the compartmentalization of essential biological processes. These condensates, which include organelles such as stress granules10, nucleoli11, and P-bodies12, form through a dynamic and reversible process involving proteins, RNA, and other biomolecules13,14. Understanding the molecular mechanisms behind complex assembly, dynamic clustering, and phase separation is key to uncovering how cells organize biochemical function in space and time13,15.

To complement experiments, computational tools like molecular dynamics (MD) simulations, both atomistic16,17 and coarse-grained1820, have been employed to study biomolecular interactions and the formation of higher-order structures. Atomistic simulations provide detailed representations of every atom in a biomolecule, offering precise insights into molecular interactions. However, their high computational cost limits their applicability to small systems and short-time scales (typically not exceeding microsecond scales)21. Coarse-grained models overcome these limitations by simplifying the complexity of biomolecules, thus enabling simulations of larger systems over longer time scales22,23 and making them particularly suited for studying concentration-dependent phase separation, condensate formation18,19 and biomolecular complex assembly processes24,25. The original COCOMO (Concentration-dependent Condensation Model)26 was developed as a one-bead-per-residue coarse-grained model specifically capturing key interactions between intrinsically disordered proteins (IDPs) and RNA that drive phase behavior. However, folded proteins and multi-domain proteins (MDPs) often also play an active role in liquid-liquid phase separation2729 and are the main components of complex assemblies. This leads to a need for a revised model that is suitable for both IDPs and proteins with folded domains.

Here, we present COCOMO2, an improved version of the original COCOMO force field that extends its applicability to folded and multi-domain proteins. Building on the original COCOMO model26, we modeled folded domains via elastic network restraints and introduced a surface exposure scaling factor λ to modulate the interaction strength of residues based on their degree of solvent accessibility. This adjustment ensures that buried and partially exposed residues contribute less to intermolecular interactions than surface-exposed residues, effectively capturing solvation effects but without the additional expense of an implicit solvent model30,31. We also refined the approach for determining saturation concentrations (csat)32 from simulations and introduced a computationally efficient protocol for approximating csat via the potential energy of condensate structures based on theory3335.

This allowed us to parameterize COCOMO2 directly using csat data obtained either from LLPS or solubility experiments, unlike from other models that rely primarily on matching single-chain properties19,20,3638, including the recent expansion of CALVADOS for multi-domain proteins36. In contrast to CALVADOS3, which assigns individual parameters to each amino acid, COCOMO2 provides a simpler model with fewer parameters by grouping polar and hydrophobic residues. COCOMO2 demonstrates significant improvements in accuracy for both phase separation behavior and single-molecule properties compared to the original COCOMO model, providing a more versatile framework for studying phase separation and solubility phenomena in diverse cellular contexts, including IDPs, MDPs, and RNA(-protein) condensates. While primarily focused on non-specific interactions, COCOMO2 could also be applied to model specific interactions involved in complex assemblies by adding system-specific interaction terms.

METHODS

Coarse-Grained Model

COCOMO2 builds on the structure of the original COCOMO model. Each amino acid or RNA nucleotide residue is represented as a single spherical particle. The total interaction energy in the system is defined as:

Utotal=Ubond+Uangle+Ushort-range+Uelectrostatic (1)

Ubond represents the bonded potential, with a harmonic bond potential ensuring connectivity along the chain:

Ubond=i=1N-112kbondli,i+1-l02+UENM (2)

where li,i+1 is the distance between two neighboring residues, kbond=·4184kJ/(mol·nm2) is the bond constant, and l0 is the equilibrium bond length. l0 is set to 0.38 nm for proteins, the average Cα-Cα distance, and to 0.5 nm for nucleotides, the average backbone distance for single-stranded nucleic acids39.

For folded domains, an additional elastic network model (ENM) is applied to stabilize the higher-order structural elements:

UENM=ijr00.9nm12kENMri,j-r02 (3)

with ri,j as the distance between two beads in a folded domain, kENM=·500kJ/(mol·nm2) as the force constant of the ENM, and r0 as the equilibrium distance based on the initial reference conformation. Only residue pairs with a r00.9nm are considered for the ENM. Additionally, consecutive residues are excluded as they are already accounted for with the bonded potential.

The angle potential Uangle maintains the chain stiffness:

Uangle=i=1N-212kangleθi,i+1,i+2-θ02 (4)

with θi,i+1,i+2 as the angle between three consecutive beads, kangle=·4.184kJ/(mol·rad2) as the angle constant for proteins and 4.184 kJ/(mol·rad2) for nucleic acids, and θ0=180° as the target angle.

The pairwise nonbonded short-range 10–5 Lennard-Jones potential (Ushort-range) is slightly modified in COCOMO2 compared to the original COCOMO:

Ushort-range=i,jξi,j4εi,j+εmodσi,jri,j10-σi,jri,j5 (5)

where ri,j is the inter-particle distance. σi,j=0.5σi+σj is the distance at which the potential is zero, The effective radii σi were set as σi=2ri2-1/6, where ri is the radius of a sphere with equivalent volume of a given residue. εi,j=εiεj is the depth of the potential well. εmod is added to either enhance interaction between positively charged residues (Arg, Lys) and aromatic residues (Phe, Tyr, Trp) (εR/K-F/Y/W=0.3kJ/mol), between positively charged residues and nucleotides (εR/K-nucleic=0.2kJ/mol) or, new in COCOMO2, between aromatic residues (εF/Y/W-F/Y/W=0.1kJ/mol).ξi,j=ξiξj is the interaction strength. For disordered regions, the interaction strength (ξ) remains constant at 1, while for residues in the folded domains, ξ is scaled based on Eq. 6.

The surface exposure scaling factor λ directly influences the interaction strength ξ, modulating electrostatic interactions based on solvent accessibility:

ξ={1forSratioλSratioλforSratio<λ (6)

with Sratio=Sresidue/Sref as the ratio between the surface area of a residue (Sresidue) and an amino-acid-specific reference area (Sref). Sref was calculated as the surface area of an amino acid embedded in alanine α-helix and is reported in Table S1 (in nm2). The interaction strength is always 1 for residues in disordered regions, while for residues in folded domains, it decreases as a function of Sratio, reaching zero for fully buried residues. ξ is calculated based on the initial reference structure and is not updated during the simulation.

Electrostatic effects are described with an adjusted Debye–Hückel potential (Uelectrostatic):

Uelectrostatic=i,jρi,j4Ai,j+A0,ijri,je-ri,j/κ (7)

where ri,j is the inter-particle distance. Ai,j=AiAj reflect the attractive or repulsive electrostatic interactions, with Ai=signqi0.75qi calculated from residue charges q (see 40). A0,i,j=A0,iA0,j reflects the effective repulsion due to solvation effects. The Debye screening length κ is set to 1 nm, corresponding to an ionic strength of ~100 mM.

For COCOMO2, the following parameters were optimized: εpolar, εhydrophobic, A0,polar, A0,hydrophobic,λ, with Arg, Asn, Asp, Cys, Gln, Glu, His, Lys, Ser, and Thr as polar residues, and Ala, Gly, Ile, Leu, Met, Phe, Pro, Trp, Tyr, and Val as hydrophobic residues. Table 1 shows a comparison between the original COCOMO and the new COCOMO2 parameters.

Table 1:

Summary of the COCOMO and COCOMO2 parameters

COCOMO COCOMO2

εpolar 0.40 kJ/mol 0.176 kJ/mol
εhydrophobic 0.41 kJ/mol 0.295 kJ/mol
A0,polar 0.05 0
A0,hydrophobic 0 0.002
λ - 0.7

Molecular Dynamics simulations

All molecular dynamics simulations with COCOMO2 were performed using OpenMM 8.0.041. Langevin dynamics was applied with a friction coefficient of 0.01 ps−1, and simulations were run as an NVT ensemble at 298 K. The integration time step was set to 10 fs during equilibration and to 20 fs for production runs. Simulations were conducted under periodic boundary conditions with nonbonded interactions truncated at 3 nm. Residues separated by one bond were excluded from nonbonded interactions energy calculation. With these parameters, it takes 4 hours to sample 1 μs for a system of 180 randomly distributed 166-residue IDPs (29880 beads in total) in a 100 nm box on an RTX 2080 Ti GPU card. For a fully condensed system, the time increases to 11 hours due to an increased number of nonbonded interactions below 3 nm.

Single-chain simulations were performed for 23 multidomain proteins, using the same starting structures as those used in the CALVADOS expansion for multi-domain proteins36. Systems were equilibrated for 5000 steps and followed by 1 μs production runs. Details of the systems, simulation box sizes and folded domain region definitions are provided in Table S2. The radii of gyration (Rg) were calculated using the MDtraj library42 and the Sresidue values were determined using the Gromacs SASA tool43,44. Multi-chain simulations typically involved 100 to 300 chains, with box sizes between 50 and 200 nm. Production runs were generally conducted for 5 μs each, with detailed simulation conditions, including folded domain definitions, listed in Tables S3S7. Protein systems with folded domains were taken from Golovanov et al.45 or Cao et al.36. For estimating saturation concentration (csat), simulations were set up as a mixture of pre-formed condensate and monomers to ensure smoother convergence and to avoid hysteresis effects (see below). All visualizations were done with VMD46.

Saturation concentration estimation

In COCOMO, the csat values were estimated as the highest concentration at which condensate formation was not observed during the simulation time, while the critical concentration (ccrit) was defined as the lowest concentration at which condensate formation was observed. Alternatively, one can determine ccrit by starting simulations from a pre-formed condensate to find the minimum concentration at which a condensate remains stable.

In the absence of supersaturation effects, csat should approach ccrit given a sufficiently long timescale, and csat is expected to be the concentration of the dilute phase when there is coexistence with condensates reached either from the disperse phase or a pre-formed condensate. However, determining csat or ccrit via simulations started from the disperse phase may lead to overestimated saturation concentrations, whereas simulations started from a condensate may underestimate the saturation concentration (Fig. 1). This is due to the time it takes to nucleate condensation or melting a condensate. Since nucleation is a stochastic process, nucleation events may not occur within the simulation timescale14, particularly at concentrations just above csat or at very low concentrations when interactions do not occur frequently.

Figure 1. Impact of starting conditions on RLP phase separation.

Figure 1.

(A) Monomer fraction after 5 μs (averaged over the last micro-second) as a function of the initial RLP concentration initiated from 180 copies of RLP in a random distribution (red diamonds) vs. a pre-formed condensate (blue circles). (B) Terminal snapshots of simulations at 5.0 μM and 39.7 μM RLP concentrations for random distribution start (top) and condensate start (bottom).

The hysteresis effect with finite-length simulations is illustrated in Fig. 1 for simulations of 180 molecules of Resilin-Like Polypeptide (RLP)47,48 at concentrations ranging from 1 to 120 μM, starting either from a random starting distribution or a pre-formed condensate. When starting with randomly distributed monomers, no stable clusters or condensates were observed below 10 μM during a 5 μs simulation (Fig. 1A). In contrast, simulations started from a fully condensed phase showed minimal monomer concentrations, even at 1 μM after 5 μs, with average monomer concentrations ranging from <0.01 to 0.3 μM depending on the initial concentration (Fig. S1). Additionally, convergence time toward an equilibrium monomer concentration from a random distribution is strongly dependent on the total RLP concentration (Fig. S1), with higher concentrations leading to a faster decline in free monomers. To address these challenges, we adopted a mixed-phase starting condition, combining both condensed and solute phases at concentrations near the expected critical threshold, typically the experimental value. This approach allows the system to equilibrate more naturally, either towards lower or higher density in the dilute phase. Overall, the mixed-phase approach provides a more reliable and unbiased strategy for estimating csat, reducing the impact of kinetic challenges introduced by extreme starting conditions.

Parameter optimization process

Optimization of COCOMO, like other similar coarse-grained force fields19,20,3638, relied heavily on single-chain properties such as the radius of gyration (Rg) to parameterize non-bonded interactions. Phase separation data were primarily used for validation or further refinement, since estimating csat via simulations is significantly more challenging than the determination of single-chain properties. In contrast, the re-parameterization of COCOMO2 was based directly on experimental phase separation and solubility data across a range of protein systems, including both intrinsically disordered proteins (IDPs) and multi-domain proteins (MDPs). The selected IDPs included four systems from the original COCOMO study26 (FUS LCD49,50, LAF-151, A1 LCD52, and hTau40-k1853) along with RLP 47,48 and alpha-synuclein (aSyn)54. For multi-domain proteins, we used two systems with phase-separation data (hnRNPA1 and hSUMO_hnRNPA1)27,36 and three single-domain proteins (MAGOH45,55, RefNM45,56, and Y14 45,57) with solubility data under comparable experimental conditions.

The parameterization approach leverages the exponential relationship between the interaction energy U and csat known from theory3335:

csat~exp-U/kBT (8)

where kB is the Boltzmann constant, and T the temperature. For each of the training systems, simulations were performed with different force field parameters, keeping the number of molecules and box sizes consistent (Table S5). We selected simulations that achieved equilibrium between monomers in solution and those in the condensate, allowing us to estimate csat. From a parameter set where no monomer remained in solution, we extracted ten representative conformations and used them to calculate interaction energies across different parameters. This enabled us to fit a linear relationship between U and the logarithm of csat:

log10csat=aU+b (9)

The fitting parameters a and b are reported in Table S8. To account for the presence of outliers, we applied the random sample consensus (RANSAC) algorithm58 implemented in sklearn59, which identifies the most reliable data points when generating linear fits.

Our parameter optimization process for COCOMO2 began with an initial scan of the parameter space using only the selected IDP systems, excluding the surface scaling factor λ as it has no impact on the IDPs (Fig. S2). This scan generated a preliminary set of parameters which were used as a starting point for further refinement. Next, the initial parameters were refined using the L-BFGS-B minimization algorithm60. This step aimed to minimize the difference between experimental csat values and those estimated from interaction energies, improving alignment between the model and experimental data. Finally, the parameters were re-optimized using the complete set of IDPs and multi-domain proteins, this time including λ, to ensure that the force field could capture the behavior of both disordered and folded protein systems. We limited the maximum value of λ to 0.7, as that value worked well with the original COCOMO model (see below) and because there were significant changes in the morphology of the condensates with higher λ values (Fig. S3).

After optimizing the parameters, we ran additional simulations to evaluate the differences between experimental and estimated via U and simulated csat. We used three more systems - TAP 45,61, GFP FUS36,62, WW3445,63 - for additional testing based on solubility and phase separation data. Additionally, we selected 14 IDPs from the original COCOMO dataset for as well as the 23 previously mentioned MDPs to test COCOMO2’s ability to reproduce radii of gyration (Rg) despite not being trained against them. We further evaluated COCOMO2 performance on heterotypic protein systems (e.g., FUS LCD with (RGRGG)5, Table S6) and protein-RNA systems (Table S7), for both of which experimental phase separation data are available.

RESULTS

Improvements with surface scaling for folded domains

The applicability of the COCOMO force field to proteins with folded domains was tested with a set of 23 MDPs, which were introduced previously when reparametrizing CALVADOS64. The primary goal was to assess how well the model could reproduce experimental Rg values (Fig. 2A). As COCOMO is not designed to fold proteins or preserve secondary structures, an elastic network was used to preserve secondary and higher-order structural elements in folded domains, which is a common approach for coarse-grained simulations36,65. From 1 μs MD simulations, the original COCOMO model could not reproduce the experimental Rg values, as the model consistently predicted too compact conformations, resulting in significantly underestimated Rg values (purple points in Fig. 2A). The relative root mean-square deviation (RMSD) between experimental and calculated Rg values is 36.2% (Fig. 2B).

Figure 2. Effect of surface scaling factor (λ) on Rg of multi-domain proteins.

Figure 2.

(A) Comparison between the experimental radius of gyration (Rg) of 23 multi-domain proteins and the predicted Rg using the original COCOMO model and COCOMO2 with varying values of λ. Without λ, COCOMO underestimates the Rg, as indicated by the deviation from the diagonal (dashed line). (B) Relative root mean-square deviation (RMSD) between predicted and experimental Rg as a function of λ. The RMSD decreases as λ increases.

To correct the overestimated interactions between folded domains, we introduced a surface exposure scaling factor λ, which adjusts the interaction strength of individual residues in folded domains based on their degree of solvent accessibility. Folded proteins feature both solvent-exposed residues, which can actively participate in intermolecular interactions, and buried residues, which primarily stabilize the internal structure. Moreover, buried residues contribute less to the solvation free energy66. In contrast, IDPs are generally more flexible, and their residues are similarly solvent-exposed and contribute similarly to intramolecular interactions. Because COCOMO was originally parametrized for IDPs, an adjustment for buried residues in folded domains is needed to effectively account for solvation effects.

More specifically, the scaling factor λ is defined as the threshold for the ratio between a residue’s surface area in the initial structure and an amino acid-specific reference value (Sratio). If Sratio is larger than λ, the full interaction strength is applied. Otherwise, the interaction strength is scaled linearly as a function of surface exposure, reaching zero for fully buried residues (see method section, Eq. 6). This approach effectively reduces the inter- and intramolecular interactions of folded domains while leaving the interactions of flexible domains and IDPs unchanged. It is important to note that the folded domains are kept intact via elastic network restraints so that the degree of solvent exposure remains the same throughout a simulation of a given system with folded domains. This allowed us to determine which residues have reduced interactions once at the beginning of the simulations and then simply apply different parameters to those residues without having to introduce a surface-exposure dependent term that is costly to evaluate continuously during simulations30,31.

The introduction of λ significantly improved the agreement between the experimental and predicted Rg values for MDPs (Fig. 2A, B), reducing the RMSD from 36.2% up to 18.1% for a λ of 0.85. Beyond single-chain properties, the scaling factor also affects phase-separation behavior, as illustrated with the folded protein MAGOH45,55 (Fig. 3). At low λ values, where reduced interactions mainly affect buried residues, few free monomers remain, leading to a low csat. At λ ≈ 0.45, the predicted critical concentration is similar to the experimental solubility value of 0.11 mM. We note that the logarithm of csat depends linearly on λ, as expected from Eq. 83335 since interaction strength scales with λ and critical concentrations for phase transitions depend on interaction strength. We will take advantage of this relationship in the next section.

Figure 3. Effect of surface scaling factor (λ) on saturation concentration (csat) on the example of MAGOH.

Figure 3.

(A) Effect of λ on the csat of MAGOH. The critical concentration is calculated as the average free monomer concentration over the last 5 μs in a 10 μs long simulation. As λ increases, the predicted critical concentration of MAGOH approaches the experimental solubility threshold (dashed line). (D) Representative snapshots of MAGOH simulations with different λ.

λ-scaled interactions significantly improve Rg and solubility estimates. There is also improvement in the agreement between experimental and simulated csat for IDP systems including folded domains (Fig. 4), but it was difficult to identify an optimal value of λ with the original COCOMO model without further optimization.

Figure 4. Dependency between potential energy and critical concentration for IDPs and folded Proteins.

Figure 4.

Comparison between the experimental and simulated csat values for IDPs (A) and multi-domain proteins (B) for different versions of the model: COCOMO2, COCOMO with λ=0.7, and COCOMO without surface scaling. The left panel shows results for the six IDPs, while the right panel presents data for proteins with folded domains. Circles represent the five proteins with folded domains used for parameterization, while diamonds indicate three additional systems (TAP, GFP FUS, WW34) used for validation. With the original COCOMO, we observed cases where no chains left the condensate during the simulation time and therefore csat could not be evaluated.

Optimizing force field parameters to reproduce critical concentrations

To further optimize COCOMO, we incorporated solubility and phase separation data directly into the parameterization process. While single-chain properties, such as Rg, tend to converge fast and require just one molecule, optimizing parameters based on csat presents a greater challenge. Optimization becomes more feasible when direct correlations exist between csat and interaction parameters or with parameter-dependent quantities that can be calculated from pre-existing structural ensembles. This avoids the need to run extensive simulations repeatedly to determine csat empirically for each parameter set. One such useful correlation we identified is the relationship between csat and λ.

Based on Eq. 8, we hypothesized that the potential energy averaged over a limited number of representative condensate structures could serve as a reliable proxy for estimating csat (Fig. 5). To explore these relationships, we ran simulations for six IDPs and five folded proteins, systematically varying key parameters: the surface scaling factor λ, the Lennard-Jones potential depth (ε), and the solvation parameter (A0) for polar and hydrophobic residues (cf. Methods section). The starting configurations for these simulations were a combination of pre-formed condensates and randomly distributed monomers at concentrations close to the experimental csat. As long as condensates did not fully dissolve or monomers did not entirely condense, we could identify meaningful correlations between force field parameters and csat.

Figure 5. Dependency between potential energy and critical concentration for IDPs and multidomain proteins.

Figure 5.

(A, B) Comparison between the average per-residue potential energy calculated from condensate structures and the logarithm of the critical concentration (in mM) with varying parameters in the COCOMO force field. The critical concentration was determined by averaging the monomer concentration over the final microsecond of a 5 μs simulation. (A) shows results for six IDPs, while (B) presents data for five folded proteins. For visualization purposes, the x-axis for MAGOH in (B) has been shifted by −1.9 kJ/(mol · N). A random sample consensus (RANSAC) model was applied to determine the linear fits shown in both panels, ensuring that outliers did not skew the trendlines.

Across all tested systems, we observed linear correlations between the potential energy and the logarithm of csat, although more outliers appeared in simulations involving multi-domain proteins (Fig. 5B), likely because of the structural complexity and heterogeneity of multi-domain proteins and/or incomplete sampling. To mitigate the effect of outliers, we applied the random sample consensus (RANSAC) algorithm58, which identifies the most reliable data points when generating linear fits.

With these linear relationships established, we optimized the force field parameters toward experimental values of csat without requiring time-consuming, fully converged simulations. The optimization process involved three key steps: First, we performed a scan of the parameter space using only IDPs systems to generate a preliminary set of parameters expected to result in relatively low deviations from the experimental csat values (Fig. S2). These initial parameter guesses were then refined through iterative minimization in the second step. Here, the goal was to minimize the difference between the experimental values and those estimated from interaction energies. Finally, we minimized the model against the experimental values by including the full set of IDPs and multi-domain proteins, incorporating λ into the optimization to ensure that the model accurately captures the behavior of both disordered and folded proteins.

To validate the optimized parameters, we ran additional simulations and compared critical concentrations estimated from potential energy with those calculated directly from the simulations using the optimized parameters (Fig. S4, S5). The RMSD for IDPs was low at 0.21, indicating strong predictive accuracy, with somewhat greater variability for folded proteins, given an RMSD of 0.97. The larger discrepancy between predicted and actual csat values from simulations was most pronounced for hSUMO_hnRNPA1, likely due to a combination of multiple folded domains and disordered regions in the same system. Moreover, our approach implicitly assumes that changes in the force field do not significantly alter the morphology of the condensate, for which we have no information from the experiment. In the case of hSUMO_hnRNPA1, higher λ values led to more surface-exposed folded domains (Fig. S3). Because of that, we limited the maximum value of λ to 0.7 during our optimization.

COCOMO2 demonstrated substantially improved predictive performance compared to the original COCOMO model with λ (Fig. 4), reducing the RMSD between simulated and experimental critical concentrations from 1.94 to 0.68 for IDPs and from 2.34 to 1.28 for partially folded proteins. Without surface scaling, the original COCOMO model clearly produced overly stable condensates across all multi-domain systems tested, as there was no coexistence with monomers in the dilute phase (Fig. 4).

Overall, our optimization strategy provided a computationally efficient approach to incorporating phase separation and solubility data, resulting in improved predictive accuracy As a result, we expect COCOMO2 to be more versatile in modeling both disordered and folded proteins.

Further evaluation of COCOMO2

We further evaluated COCOMO2 on systems not used for optimization. We tested COCOMO2 performance for predicting Rg values across both IDPs and proteins with folded domains (Fig. 6). As mentioned before, experimental Rg values were not used to optimize COCOMO2. We found notable improvement with COCOMO2. For the IDPs, we selected a representative sub-section of systems tested in the original COCOMO paper26 to cover a wide range of experimental Rg values. The original COCOMO model performed well for IDPs with up to ~150 amino acid residues but underestimated the Rg values of longer chains. With the COCOMO2 parameters the interaction energy is reduced, allowing for a more extended conformational ensemble, especially for longer chains. As a result, there is marked improvement when comparing predicted Rg values with experimental data, reducing χ2 from 3.65 to 2.11 across the IDP set (Fig. 6B). While improvements were observed across all tested systems, the effects were especially notable for MDPs. The original COCOMO parameters consistently underestimated the Rg values. The introduction of the surface scaling (λ=0.7) already provided a substantial improvement by decreasing the interactions of buried residues, reducing the χ2 from 0.75 to 0.55. With the reparametrized COCOMO2 model, there was further reduction of χ2 to 0.24, with a marked impact on the system with extended disordered regions and small folded domains like HeV_V and NiV_V67, whose Rg values are only weakly affected by varying λ values. As such, despite not being explicitly parameterized against Rg values, COCOMO2 provides a substantial improvement over the original COCOMO in accurately predicting Rg across a wide range of systems.

Figure 6. Comparison of experimental vs simulated Radius of Gyration (Rg) for IDPs and multi-domain proteins.

Figure 6.

(A) Experimental vs simulated Rg for folded proteins using COCOMO2, COCOMO with λ=0.7, and the original COCOMO parameters. (B) Comparison between experimental and simulated Rg for IDPs using COCOMO2 and the original COCOMO parameters (λ does not affect IDPs and disordered regions). COCOMO2 provides better agreement with experimental Rg values than the COCOMO model, especially for systems with more residues. (C) Relative differences in Rg (ΔRg [%]) between simulation and experimental data for all tested systems, grouped by protein type (top folded proteins, bottom IDPs). Negative values indicate underestimation of Rg by the model, while positive values indicate overestimation. COCOMO2 exhibits smaller deviations compared to the other models, with the largest improvements observed for folded proteins.

We also observed improved estimates of csat for three additional folded systems that were not included in the parameterization (decrease in RMSD from 2.75 to 1.34, Fig. 4), demonstrating that the improved performance of COCOMO2 is not limited to the training set. As an additional test, we evaluated whether COCOMO2 could still accurately predict heterotypic phase separation50 and protein-RNA phase separation as with the original COCOMO model. We set up simulations with 200 μM FUS LCD, partially as monomers, partially in a pre-formed condensate, together with different ratios of (RGRGG)5 peptides (Fig. 7A). This system was used as a test set in the original COCOMO paper26 due to the availability of heterotypic phase separation data. We found that at 1:1 and 1:2 FUS LCD/(RGRGG)5 ratios, the condensate dissociates, while it remained stable at ratios of 1:5 and 1:10 after ten μs simulation. This result suggests that higher concentrations of (RGRGG)5 are required to stabilize the heterotypic condensate, consistent with the experimental results50, confirming that COCOMO2 retains the ability to capture the system-dependence determinants of heterotypic phase separation.

Figure 7. Protein heterotypic and protein-RNA phase separation.

Figure 7.

(A) Simulation snapshots after 10 μs simulations of mixes between FUS LCD (green) and (RGRGG)₅ (blue) at varying concentration ratios. All simulations were set up with 200 μM FUS LCD concentration with a pre-formed condensate already present. For low (RGRGG)₅ concentrations, the condensate has dissolved. (B) Simulation snapshots for various RNA-protein systems at concentration with experimentally observed phase separation. Proteins are colored in blue, and RNA molecules in red. Systems include polyAde-21/(RRLR)6, polyUra-40/FUS LCDRGG3, polyUra-10/polyArg-50, and polyAde-500/(RGRGG)₅ with a randomly dispersed initial state. The final states show varying degrees of phase separation, with polyAde-500/(RGRGG)₅ remaining largely dispersed.

Additionally, we tested four RNA-IDP systems (polyAde-21/(RRLR)668, polyUra-40/FUS LCDRGG350, polyUra-10/polyArg-5069, and polyAde-500/(RGRGG)₅70) at concentrations where phase separation was observed experimentally. These simulations started from a randomly distributed state. We observed phase separation in three of the four systems, with polyAde-500/(RGRGG)₅ remaining dispersed. Compared to the original COCOMO simulations, a higher number of IDP molecules remained in the solution, consistent with weaker protein interactions in the revised force field. Therefore, COCOMO2 agrees qualitatively with the experiment for three out of the four systems without further optimization of the protein-RNA interactions. Further optimization of the protein-RNA interactions may be a subject of future work but may require more extensive experimental data, in particular, critical concentrations of peptides and nucleic acids, in addition to qualitative observations of phase separation at certain concentrations.

DISCUSSION and CONCLUSION

We present an updated version of the COCOMO coarse-grained model for peptides and proteins that extends its application to folded proteins and IDPs with folded domains. Simulations of folded proteins with COCOMO requires elastic network model restraints to keep secondary and tertiary structures intact but opens up new applications, including the simulation of complex assembly processes.

One of the core enhancements in COCOMO2 is the introduction of surface scaling factor λ, which adjusts the interaction strength of residues based on solvent accessibility. This modification addresses a key limitation of the original model, which overestimated the interaction strength of folded proteins. In COCOMO2, buried residues in folded domains now contribute less to intermolecular interactions, ensuring a more accurate representation of folded domains. However, this change alone was insufficient to fully resolve the underestimation of critical concentrations observed for both IDPs and multi-domain proteins.

We further improved the model by incorporating phase-separation data directly into parameterization, leveraging a linear relationship between potential energy in condensates and the logarithm of critical concentration. Until now, most similar, single-bead-per-residue coarse-grained models for IDPs relied primarily on single-chain properties like the radius of gyration to derive the force field parameters for parameterization and used phase-separation data as control or validation. Our approach could potentially also benefit other force fields. We observed that this method performs better for IDPs than for multi-domain proteins, where deviations between simulated and experimental critical concentrations were more pronounced. This suggests that further optimization via conventional cycles of parameter variations and trial simulations could be possible, but the greatly improved accuracy of COCOMO2 may already be sufficient for typical applications that are focused on qualitative or semi-quantitative predictions of interaction preferences and the resulting condensation and/or clustering of proteins including folded domains.

Among other available methods, the recently updated CALVADOS336 model predicts csat with similar deviation for multi-domain proteins as COCOMO2, typically about one order of magnitude or more different from experimental values. However, COCOMO2 and CALVADOS3 adopt different design philosophies: CALVADOS3 extends its applicability from IDPs to folded proteins by shifting the bead location in folded domains from the Cα position to the center of mass which may implicitly capture increased interactions of surface-exposed residues. Instead, COCOMO2 uses a surface scaling approach that explicitly considers residue burial. Moreover, COCOMO2 emphasizes a minimal number of parameters by grouping polar and hydrophobic residues, while CALVADOS3 assigns individual parameters to each amino acid. As a result, CALVADOS3 is more sensitive to the exact amino acid sequence of a given protein and can capture the effects of mutations better than COCOMO2, but the higher level of detail may limit generalizability. Whether this is, in fact, a concern remains to be seen, given the relatively small amount of experimental data on critical concentrations for phase separation available for comparison.

The optimization of COCOMO2 using phase-separation also improved the agreement with experimental Rg values for both IDPs and multidomain properties. The original COCOMO worked well for smaller IDPs but underestimated the Rg of longer chains. In COCOMO2, weaker interaction parameters for polar and hydrophobic residues allow for more expanded conformations, improving accuracy. However, comparing again with other methods, CALVADOS3 provides significantly more accurate Rg predictions since it was explicitly optimized to match experimental Rg values.

COCOMO2 parameterization and validation focused on critical concentrations for phase separation and single-chain Rg values, as such data is readily available from experiments. However, we expect that COCOMO2 will also be useful for the study of specific assembly processes. In assembly processes, it is important to avoid non-specific aggregation but favor interactions at specific sites that lead to organized higher-order structures. We expect that the generic nature of COCOMO2 is well-suited to address the avoidance of non-specific aggregation and that preference for interactions at specific sites can be added via knowledge-based potentials71,72. The advantage of such an approach compared to more coarse-grained models used previously for studying assembly processes7375 is that the residue-level model of COCOMO provides a higher level of detail with a path to connect with atomistic models76 in multi-scale applications. These strategies will be explored in future studies.

In conclusion, COCOMO2 offers a comprehensive framework for modeling interactions between peptides and proteins and nucleic acids that is extended to also include folded proteins. Its balance between simplicity and physical accuracy positions COCOMO2 as a valuable resource for understanding biomolecular condensates and complex molecular environments. The highly efficient model makes it possible to study phase separation and assembly processes on sub-μm scales and millisecond time scales.

Supplementary Material

Supplement 1
media-1.pdf (2.3MB, pdf)

ACKNOWLEDGEMENTS

Research was primarily supported as part of the Center for Catalysis in Biomimetic Confinement, an Energy Frontier Research Center funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences under Award #DE-SC0023395. In addition, Lisa Lapidus and Michael Feig acknowledge support by the National Science Foundation under Award #MCB1817307, (prediction of concentration-dependent liquid-liquid phase separation) and Michael Feig acknowledges support by the National Institutes of Health (NIGMS) under Award #R35GM126948, (development of a coarse-grained model for use in multi-scale applications of cellular environments).

REFERENCES

  • (1).Schweke H.; Pacesa M.; Levin T.; Goverde C. A.; Kumar P.; Duhoo Y.; Dornfeld L. J.; Dubreuil B.; Georgeon S.; Ovchinnikov S.; Woolfson D. N.; Correia B. E.; Dey S.; Levy E. D. An Atlas of Protein Homo-Oligomerization across Domains of Life. Cell 2024, 187 (4), 999–1010.e15. 10.1016/j.cell.2024.01.022. [DOI] [PubMed] [Google Scholar]
  • (2).Louros N.; Schymkowitz J.; Rousseau F. Mechanisms and Pathology of Protein Misfolding and Aggregation. Nat Rev Mol Cell Biol 2023, 24 (12), 912–933. 10.1038/s41580-023-00647-2. [DOI] [PubMed] [Google Scholar]
  • (3).Klinge S.; Woolford J. L. Ribosome Assembly Coming into Focus. Nat Rev Mol Cell Biol 2019, 20 (2), 116–131. 10.1038/s41580-018-0078-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (4).Richter W. F.; Nayak S.; Iwasa J.; Taatjes D. J. The Mediator Complex as a Master Regulator of Transcription by RNA Polymerase II. Nat Rev Mol Cell Biol 2022, 23 (11), 732–749. 10.1038/s41580-022-00498-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (5).Beck M.; Hurt E. The Nuclear Pore Complex: Understanding Its Function through Structural Insight. Nat Rev Mol Cell Biol 2017, 18 (2), 73–89. 10.1038/nrm.2016.147. [DOI] [PubMed] [Google Scholar]
  • (6).Perlmutter J. D.; Hagan M. F. Mechanisms of Virus Assembly. Annu Rev Phys Chem 2015, 66 (1), 217–239. 10.1146/annurev-physchem-040214-121637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (7).Kerfeld C. A.; Aussignargues C.; Zarzycki J.; Cai F.; Sutter M. Bacterial Microcompartments. Nat Rev Microbiol 2018, 16 (5), 277–290. 10.1038/nrmicro.2018.10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (8).Nawrocki G.; Wang P.; Yu I.; Sugita Y.; Feig M. Slow-Down in Diffusion in Crowded Protein Solutions Correlates with Transient Cluster Formation. J Phys Chem B 2017, 121 (49), 11072–11084. 10.1021/acs.jpcb.7b08785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (9).Poudyal M.; Patel K.; Gadhe L.; Sawner A. S.; Kadu P.; Datta D.; Mukherjee S.; Ray S.; Navalkar A.; Maiti S.; Chatterjee D.; Devi J.; Bera R.; Gahlot N.; Joseph J.; Padinhateeri R.; Maji S. K. Intermolecular Interactions Underlie Protein/Peptide Phase Separation Irrespective of Sequence and Structure at Crowded Milieu. Nat Commun 2023, 14 (1), 6199. 10.1038/s41467-023-41864-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (10).Molliex A.; Temirov J.; Lee J.; Coughlin M.; Kanagaraj A. P.; Kim H. J.; Mittag T.; Taylor J. P. Phase Separation by Low Complexity Domains Promotes Stress Granule Assembly and Drives Pathological Fibrillization. Cell 2015, 163 (1), 123–133. 10.1016/j.cell.2015.09.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (11).Wei J.; Yoshimura S. H. The Role of Liquid–Liquid Phase Separation in the Structure and Function of Nucleolus. In Phase Separation in Living Cells; Springer Nature Singapore: Singapore, 2023; pp 191–206. 10.1007/978-981-99-4886-4_11. [DOI] [Google Scholar]
  • (12).Luo Y.; Na Z.; Slavoff S. A. P-Bodies: Composition, Properties, and Functions. Biochemistry 2018, 57 (17), 2424–2431. 10.1021/acs.biochem.7b01162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (13).Shin Y.; Brangwynne C. P. Liquid Phase Condensation in Cell Physiology and Disease. Science (1979) 2017, 357 (6357). 10.1126/science.aaf4382. [DOI] [PubMed] [Google Scholar]
  • (14).Falahati H.; Haji-Akbari A. Thermodynamically Driven Assemblies and Liquid–Liquid Phase Separations in Biology. Soft Matter 2019, 15 (6), 1135–1154. 10.1039/C8SM02285B. [DOI] [PubMed] [Google Scholar]
  • (15).Mehta S.; Zhang J. Liquid–Liquid Phase Separation Drives Cellular Function and Dysfunction in Cancer. Nat Rev Cancer 2022, 22 (4), 239–252. 10.1038/s41568-022-00444-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (16).Paloni M.; Bailly R.; Ciandrini L.; Barducci A. Unraveling Molecular Interactions in Liquid–Liquid Phase Separation of Disordered Proteins by Atomistic Simulations. J Phys Chem B 2020, 124 (41), 9009–9016. 10.1021/acs.jpcb.0c06288. [DOI] [PubMed] [Google Scholar]
  • (17).Zheng W.; Dignon G. L.; Jovic N.; Xu X.; Regy R. M.; Fawzi N. L.; Kim Y. C.; Best R. B.; Mittal J. Molecular Details of Protein Condensates Probed by Microsecond Long Atomistic Simulations. J Phys Chem B 2020, 124 (51), 11671–11679. 10.1021/acs.jpcb.0c10489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (18).Shi S.; Zhao L.; Lu Z.-Y. Coarse-Grained Modeling of Liquid–Liquid Phase Separation in Cells: Challenges and Opportunities. J Phys Chem Lett 2024, 15 (28), 7280–7287. 10.1021/acs.jpclett.4c01261. [DOI] [PubMed] [Google Scholar]
  • (19).Tesei G.; Schulze T. K.; Crehuet R.; Lindorff-Larsen K. Accurate Model of Liquid–Liquid Phase Behavior of Intrinsically Disordered Proteins from Optimization of Single-Chain Properties. Proc. Natl. Acad. Sci. USA 2021, 118 (44), e2111696118. 10.1073/pnas.2111696118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (20).Dignon G. L.; Zheng W.; Kim Y. C.; Best R. B.; Mittal J. Sequence Determinants of Protein Phase Behavior from a Coarse-Grained Model. PLoS Comput Biol 2018, 14 (1), e1005941. 10.1371/journal.pcbi.1005941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (21).Hollingsworth S. A.; Dror R. O. Molecular Dynamics Simulation for All. Neuron 2018, 99 (6), 1129–1143. 10.1016/j.neuron.2018.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (22).Predeus A. V.; Gul S.; Gopal S. M.; Feig M. Conformational Sampling of Peptides in the Presence of Protein Crowders from AA/CG-Multiscale Simulations. J Phys Chem B 2012, 116 (29), 8610–8620. 10.1021/jp300129u. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (23).Kar P.; Feig M. Chapter Five - Recent Advances in Transferable Coarse-Grained Modeling of Proteins. In Biomolecular Modelling and Simulations; Karabencheva-Christova, T., Ed.; Advances in Protein Chemistry and Structural Biology; Academic Press, 2014; Vol. 96, pp 143–180. 10.1016/bs.apcsb.2014.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (24).Pak A. J.; Voth G. A. Advances in Coarse-Grained Modeling of Macromolecular Complexes. Curr Opin Struct Biol 2018, 52, 119–126. 10.1016/j.sbi.2018.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (25).Joshi S. Y.; Deshmukh S. A. A Review of Advancements in Coarse-Grained Molecular Dynamics Simulations. Mol Simul 2021, 47 (10–11), 786–803. 10.1080/08927022.2020.1828583. [DOI] [Google Scholar]
  • (26).Valdes-Garcia G.; Heo L.; Lapidus L. J.; Feig M. Modeling Concentration-Dependent Phase Separation Processes Involving Peptides and RNA via Residue-Based Coarse-Graining. J Chem Theory Comput 2023, 19 (2), 669–678. 10.1021/acs.jctc.2c00856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (27).Martin E. W.; Thomasen F. E.; Milkovic N. M.; Cuneo M. J.; Grace C. R.; Nourse A.; Lindorff-Larsen K.; Mittag T. Interplay of Folded Domains and the Disordered Low-Complexity Domain in Mediating HnRNPA1 Phase Separation. Nucleic Acids Res 2021, 49 (5), 2931–2945. 10.1093/nar/gkab063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (28).Arora S.; Roy D. S.; Maiti S.; Ainavarapu S. R. K. Phase Separation and Aggregation of a Globular Folded Protein Small Ubiquitin-like Modifier 1 (SUMO1). J Phys Chem Lett 2023, 14 (40), 9060–9068. 10.1021/acs.jpclett.3c02092. [DOI] [PubMed] [Google Scholar]
  • (29).Li P.; Banjade S.; Cheng H.-C.; Kim S.; Chen B.; Guo L.; Llaguno M.; Hollingsworth J. V.; King D. S.; Banani S. F.; Russo P. S.; Jiang Q.-X.; Nixon B. T.; Rosen M. K. Phase Transitions in the Assembly of Multivalent Signalling Proteins. Nature 2012, 483 (7389), 336–340. 10.1038/nature10879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (30).Onufriev A. V.; Case D. A. Generalized Born Implicit Solvent Models for Biomolecules. Annu Rev Biophys 2019, 48 (1), 275–296. 10.1146/annurev-biophys-052118-115325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (31).Feig M.; Brooks C. L. Recent Advances in the Development and Application of Implicit Solvent Models in Biomolecule Simulations. Curr Opin Struct Biol 2004, 14 (2), 217–224. 10.1016/j.sbi.2004.03.009. [DOI] [PubMed] [Google Scholar]
  • (32).Riback J. A.; Zhu L.; Ferrolino M. C.; Tolbert M.; Mitrea D. M.; Sanders D. W.; Wei M.-T.; Kriwacki R. W.; Brangwynne C. P. Composition-Dependent Thermodynamics of Intracellular Phase Separation. Nature 2020, 581 (7807), 209–214. 10.1038/s41586-020-2256-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (33).Huggins M. L. Solutions of Long Chain Compounds. J Chem Phys 1941, 9 (5), 440–440. 10.1063/1.1750930. [DOI] [Google Scholar]
  • (34).Flory P. J. Thermodynamics of High Polymer Solutions. J Chem Phys 1942, 10 (1), 51–61. 10.1063/1.1723621. [DOI] [Google Scholar]
  • (35).Qian D.; Michaels T. C. T.; Knowles T. P. J. Analytical Solution to the Flory–Huggins Model. J Phys Chem Lett 2022, 13 (33), 7853–7860. 10.1021/acs.jpclett.2c01986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (36).Cao F.; von Bülow S.; Tesei G.; Lindorff-Larsen K. A Coarse-grained Model for Disordered and Multi-domain Proteins. Protein Sci. 2024, 33 (11), e5172. 10.1002/pro.5172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (37).Regy R. M.; Thompson J.; Kim Y. C.; Mittal J. Improved Coarse-grained Model for Studying Sequence Dependent Phase Separation of Disordered Proteins. Protein Sci. 2021, 30 (7), 1371–1379. 10.1002/pro.4094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (38).Dannenhoffer-Lafage T.; Best R. B. A Data-Driven Hydrophobicity Scale for Predicting Liquid–Liquid Phase Separation of Proteins. J Phys Chem B 2021, 125 (16), 4046–4056. 10.1021/acs.jpcb.0c11479. [DOI] [PubMed] [Google Scholar]
  • (39).Ghobadi A. F.; Jayaraman A. Effect of Backbone Chemistry on Hybridization Thermodynamics of Oligonucleic Acids: A Coarse-Grained Molecular Dynamics Simulation Study. Soft Matter 2016, 12 (8), 2276–2287. 10.1039/C5SM02868J. [DOI] [PubMed] [Google Scholar]
  • (40).Dutagaci B.; Nawrocki G.; Goodluck J.; Ashkarran A. A.; Hoogstraten C. G.; Lapidus L. J.; Feig M. Charge-Driven Condensation of RNA and Proteins Suggests Broad Role of Phase Separation in Cytoplasmic Environments. Elife 2021, 10 (26), e64004. 10.7554/eLife.64004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (41).Eastman P.; Galvelis R.; Peláez R. P.; Abreu C. R. A.; Farr S. E.; Gallicchio E.; Gorenko A.; Henry M. M.; Hu F.; Huang J.; Krämer A.; Michel J.; Mitchell J. A.; Pande V. S.; Rodrigues J. P.; Rodriguez-Guerra J.; Simmonett A. C.; Singh S.; Swails J.; Turner P.; Wang Y.; Zhang I.; Chodera J. D.; De Fabritiis G.; Markland T. E. OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials. J Phys Chem B 2024, 128 (1), 109–116. 10.1021/acs.jpcb.3c06662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (42).McGibbon R. T.; Beauchamp K. A.; Harrigan M. P.; Klein C.; Swails J. M.; Hernández C. X.; Schwantes C. R.; Wang L.-P.; Lane T. J.; Pande V. S. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys J 2015, 109 (8), 1528–1532. 10.1016/j.bpj.2015.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (43).Eisenhaber F.; Lijnzaad P.; Argos P.; Sander C.; Scharf M. The Double Cubic Lattice Method: Efficient Approaches to Numerical Integration of Surface Area and Volume and to Dot Surface Contouring of Molecular Assemblies. J Comput Chem 1995, 16 (3), 273–284. 10.1002/jcc.540160303. [DOI] [Google Scholar]
  • (44).Abraham M. J.; Murtola T.; Schulz R.; Páll S.; Smith J. C.; Hess B.; Lindahl E. GROMACS: High Performance Molecular Simulations through Multi-Level Parallelism from Laptops to Supercomputers. SoftwareX 2015, 1–2, 19–25. 10.1016/j.softx.2015.06.001. [DOI] [Google Scholar]
  • (45).Golovanov A. P.; Hautbergue G. M.; Wilson S. A.; Lian L.-Y. A Simple Method for Improving Protein Solubility and Long-Term Stability. J Am Chem Soc 2004, 126 (29), 8933–8939. 10.1021/ja049297h. [DOI] [PubMed] [Google Scholar]
  • (46).Humphrey W.; Dalke A.; Schulten K. VMD: Visual Molecular Dynamics. J Mol Graph 1996, 14 (1), 33–38. 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
  • (47).Li L.; Kiick K. L. Resilin-Based Materials for Biomedical Applications. ACS Macro Lett 2013, 2 (8), 635–640. 10.1021/mz4002194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (48).Dai Y.; Farag M.; Lee D.; Zeng X.; Kim K.; Son H.; Guo X.; Su J.; Peterson N.; Mohammed J.; Ney M.; Shapiro D. M.; Pappu R. V.; Chilkoti A.; You L. Programmable Synthetic Biomolecular Condensates for Cellular Control. Nat Chem Biol 2023, 19 (4), 518–528. 10.1038/s41589-022-01252-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (49).Murthy A. C.; Dignon G. L.; Kan Y.; Zerze G. H.; Parekh S. H.; Mittal J.; Fawzi N. L. Molecular Interactions Underlying Liquid−liquid Phase Separation of the FUS Low-Complexity Domain. Nat Struct Mol Biol 2019, 26 (7), 637–648. 10.1038/s41594-019-0250-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (50).Kaur T.; Raju M.; Alshareedah I.; Davis R. B.; Potoyan D. A.; Banerjee P. R. Sequence-Encoded and Composition-Dependent Protein-RNA Interactions Control Multiphasic Condensate Morphologies. Nat Commun 2021, 12 (1), 872. 10.1038/s41467-021-21089-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (51).Elbaum-Garfinkle S.; Kim Y.; Szczepaniak K.; Chen C. C.-H.; Eckmann C. R.; Myong S.; Brangwynne C. P. The Disordered P Granule Protein LAF-1 Drives Phase Separation into Droplets with Tunable Viscosity and Dynamics. Proc. Natl. Acad. Sci. USA 2015, 112 (23), 7189–7194. 10.1073/pnas.1504822112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (52).Bremer A.; Farag M.; Borcherds W. M.; Peran I.; Martin E. W.; Pappu R. V.; Mittag T. Deciphering How Naturally Occurring Sequence Features Impact the Phase Behaviours of Disordered Prion-like Domains. Nat Chem 2022, 14 (2), 196–207. 10.1038/s41557-021-00840-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (53).Ambadipudi S.; Biernat J.; Riedel D.; Mandelkow E.; Zweckstetter M. Liquid–Liquid Phase Separation of the Microtubule-Binding Repeats of the Alzheimer-Related Protein Tau. Nat Commun 2017, 8 (1), 275. 10.1038/s41467-017-00480-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (54).Ray S.; Singh N.; Kumar R.; Patel K.; Pandey S.; Datta D.; Mahato J.; Panigrahi R.; Navalkar A.; Mehra S.; Gadhe L.; Chatterjee D.; Sawner A. S.; Maiti S.; Bhatia S.; Gerez J. A.; Chowdhury A.; Kumar A.; Padinhateeri R.; Riek R.; Krishnamoorthy G.; Maji S. K. α-Synuclein Aggregation Nucleates through Liquid–Liquid Phase Separation. Nat Chem 2020, 12 (8), 705–716. 10.1038/s41557-020-0465-9. [DOI] [PubMed] [Google Scholar]
  • (55).Kataoka N.; Diem M. D.; Kim V. N.; Yong J.; Dreyfuss G. Magoh, a Human Homolog of Drosophila Mago Nashi Protein, Is a Component of the Splicing-Dependent Exon-Exon Junction Complex. EMBO J 2001, 20 (22), 6424–6433. 10.1093/emboj/20.22.6424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (56).Stutz F.; Bachi A.; Doerks T.; Braun I. C.; Seraphin B.; Wilm M.; Bork P.; Izaurralde E. REF, an Evolutionarily Conserved Family of HnRNP-like Proteins, Interacts with TAP/Mex67p and Participates in MRNA Nuclear Export. RNA 2000, 6 (4), S1355838200000078. 10.1017/S1355838200000078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (57).Kataoka N.; Yong J.; Kim V. N.; Velazquez F.; Perkinson R. A.; Wang F.; Dreyfuss G. Pre-MRNA Splicing Imprints MRNA in the Nucleus with a Novel RNA-Binding Protein That Persists in the Cytoplasm. Mol Cell 2000, 6 (3), 673–682. 10.1016/S1097-2765(00)00065-4. [DOI] [PubMed] [Google Scholar]
  • (58).Fischler M. A.; Bolles R. C. Random Sample Consensus. Commun ACM 1981, 24 (6), 381–395. 10.1145/358669.358692. [DOI] [Google Scholar]
  • (59).Pedregosa F.; Varoquaux G.; Gramfort A.; Michel V.; Thirion B.; Grisel O.; Blondel M.; Prettenhofer P.; Weiss R.; Dubourg V.; Vanderplas J.; Passos A.; Cournapeau D.; Brucher M.; Perrot M.; Duchesnay É. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  • (60).Zhu C.; Byrd R. H.; Lu P.; Nocedal J. Algorithm 778: L-BFGS-B. ACM Transactions on Mathematical Software 1997, 23 (4), 550–560. 10.1145/279232.279236. [DOI] [Google Scholar]
  • (61).Katahira J.; Sträßer K.; Podtelejnikov A.; Mann M.; Jung J. U.; Hurt E. The Mex67p-Mediated Nuclear MRNA Export Pathway Is Conserved from Yeast to Human. EMBO J 1999, 18 (9), 2593–2609. 10.1093/emboj/18.9.2593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (62).Wang J.; Choi J.-M.; Holehouse A. S.; Lee H. O.; Zhang X.; Jahnel M.; Maharana S.; Lemaitre R.; Pozniakovsky A.; Drechsel D.; Poser I.; Pappu R. V.; Alberti S.; Hyman A. A. A Molecular Grammar Governing the Driving Forces for Phase Separation of Prion-like RNA Binding Proteins. Cell 2018, 174 (3), 688–699.e16. 10.1016/j.cell.2018.06.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (63).Cornell M.; Evans D. A. P.; Mann R.; Fostier M.; Flasza M.; Monthatong M.; Artavanis-Tsakonas S.; Baron M. The Drosophila Melanogaster Suppressor of Deltex Gene, a Regulator of the Notch Receptor Signaling Pathway, Is an E3 Class Ubiquitin Ligase. Genetics 1999, 152 (2), 567–576. 10.1093/genetics/152.2.567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (64).Cao F.; von Bülow S.; Tesei G.; Lindorff-Larsen K. A Coarse-Grained Model for Disordered and Multi-Domain Proteins. bioRxiv 2024, 2024.02.03.578735. 10.1101/2024.02.03.578735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (65).Periole X.; Cavalli M.; Marrink S.-J.; Ceruso M. A. Combining an Elastic Network With a Coarse-Grained Molecular Force Field: Structure, Dynamics, and Intermolecular Recognition. J Chem Theory Comput 2009, 5 (9), 2531–2543. 10.1021/ct9002114. [DOI] [PubMed] [Google Scholar]
  • (66).Lomize A. L.; Reibarkh M. Y.; Pogozheva I. D. Interatomic Potentials and Solvation Parameters from Protein Engineering Data for Buried Residues. Protein Sci. 2002, 11 (8), 1984–2000. 10.1110/ps.0307002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (67).Salladini E.; Delauzun V.; Longhi S. The Henipavirus V Protein Is a Prevalently Unfolded Protein with a Zinc-Finger Domain Involved in Binding to DDB1. Mol. BioSyst. 2017, 13 (11), 2254–2267. 10.1039/C7MB00488E. [DOI] [PubMed] [Google Scholar]
  • (68).Bai Q.; Zhang Q.; Jing H.; Chen J.; Liang D. Liquid–Liquid Phase Separation of Peptide/Oligonucleotide Complexes in Crowded Macromolecular Media. J Phys Chem B 2021, 125 (1), 49–57. 10.1021/acs.jpcb.0c09225. [DOI] [PubMed] [Google Scholar]
  • (69).Fisher R. S.; Elbaum-Garfinkle S. Tunable Multiphase Dynamics of Arginine and Lysine Liquid Condensates. Nat Commun 2020, 11 (1), 4628. 10.1038/s41467-020-18224-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (70).Alshareedah I.; Kaur T.; Ngo J.; Seppala H.; Kounatse L.-A. D.; Wang W.; Moosa M. M.; Banerjee P. R. Interplay between Short-Range Attraction and Long-Range Repulsion Controls Reentrant Liquid Condensation of Ribonucleoprotein–RNA Complexes. J Am Chem Soc 2019, 141 (37), 14593–14602. 10.1021/jacs.9b03689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (71).Takada S. Gō Model Revisited. Biophys Physicobiol 2019, 16 (0), 248–255. 10.2142/biophysico.16.0_248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (72).Bačić Toplek F.; Scalone E.; Stegani B.; Paissoni C.; Capelli R.; Camilloni C. Multi-EGO: Model Improvements toward the Study of Complex Self-Assembly Processes. J Chem Theory Comput 2024, 20 (1), 459–468. 10.1021/acs.jctc.3c01182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (73).Rapaport D. C. Molecular Dynamics Study of T = 3 Capsid Assembly. J Biol Phys 2018, 44 (2), 147–162. 10.1007/s10867-018-9486-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (74).Hagan M. F.; Mohajerani F. Self-Assembly Coupled to Liquid-Liquid Phase Separation. PLoS Comput Biol 2023, 19 (5), e1010652. 10.1371/journal.pcbi.1010652. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (75).Mohajerani F.; Hagan M. F. The Role of the Encapsulated Cargo in Microcompartment Assembly. PLoS Comput Biol 2018, 14 (7), e1006351. 10.1371/journal.pcbi.1006351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (76).Heo L.; Feig M. One Bead per Residue Can Describe All-Atom Protein Structures. Structure 2024, 32 (1), 97–111.e6. 10.1016/j.str.2023.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.pdf (2.3MB, pdf)

Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES