Abstract
It has become increasingly evident that the conformational distributions of intrinsically disordered proteins or regions are strongly dependent on their amino acid compositions and sequence. To facilitate a systematic investigation of these sequence-ensemble relationships, we selected a set of 16 naturally occurring intrinsically disordered regions of identical length but with large differences in amino acid composition, hydrophobicity, and charge patterning. We probed their conformational ensembles with single-molecule Förster resonance energy transfer (FRET), complemented by circular dichroism (CD) and nuclear magnetic resonance (NMR) spectroscopy as well as small-angle X-ray scattering (SAXS). The set of disordered proteins shows a strong dependence of the chain dimensions on sequence composition, with chain volumes differing by up to a factor of 6. The residue-specific intrachain interaction networks that underlie these pronounced differences were identified using atomistic simulations combined with ensemble reweighting, revealing the important role of charged, aromatic, and polar residues. To advance a transferable description of disordered protein regions, we further employed the experimental data to parametrize a coarse-grained model for disordered proteins that includes an explicit representation of the FRET fluorophores and successfully describes experiments with different dye pairs. Our findings demonstrate the value of integrating experiments and simulations for advancing our quantitative understanding of the sequence features that determine the conformational ensembles of intrinsically disordered proteins.
Keywords: intrinsically disordered proteins, single-molecule spectroscopy, Förster resonance energy transfer (FRET), atomistic simulations, coarse-grained simulations, chain dimensions, local expansion and compaction
Introduction
Large parts of the proteomes of higher eukaryotes consist of intrinsically disordered proteins (IDPs), which do not adopt a well-defined three-dimensional structure under physiological conditions.1 For instance, ∼58% of human proteins contain both folded domains and intrinsically disordered regions (IDRs).2 IDRs occur in a variety of structural contexts, from tails and linkers between folded domains to fully disordered proteins, and they are particularly prevalent in regulation, such as in transcription and signaling,3 as well as in cellular organization via phase separation.4 Despite the lack of a well-defined tertiary structure, however, the conformational properties of IDPs are far from uniform: They range from compact states that can be rich in secondary structure to less compact ensembles all the way to highly expanded chains with no detectable secondary structure.5−16
For classifying and quantifying this continuous spectrum of disorder, concepts from polymer physics can be useful.3,14,17,18 For instance, based on the combination of net charge per residue and fraction of charged residues, IDPs can be grouped into strong and weak polyelectrolytes and polyampholytes,19 and classified by their chain dimensions in terms of ensemble-averaged quantities, such as their hydrodynamic radius, radius of gyration, or end-to-end distance.6,12,18,20,21 A helpful quantity for characterizing the dimensions of unfolded and disordered proteins independent of chain length is the scaling exponent, ν,9,22 which relates the chain dimensions, R, to the number of residues or chain segments, N, as R ∝ Nν. For infinitely long homopolymers, ν can take values of 1/3 for compact globules (and globular folded proteins), 1/2 for Flory random coils, and ∼0.588 for excluded volume chains.22 However, intermediate and larger values are commonly observed in simulations and experiments.9,16,23−25 Examples are highly charged sequences with pronounced electrostatic repulsion,9,14,26 which can approach ν ≈ 1 for rod-like conformations.27 Other reasons for deviations from canonical scaling are finite-size effects28,29 and heterogeneous patterns of intrachain interactions owing to the heteropolymeric nature of IDPs,12,30 especially the contributions of high fractions of charged residues7−9 and charge patterning.10
Considerable effort has been made to relate the dimensions of IDPs to their sequence properties and enable a predictive understanding of how intrachain interactions determine the sizes and shapes of the IDPs. Emerging consensus suggests that sequences rich in hydrophobic residues and certain polar tracts tend to favor compaction, whereas sequences rich in charged residues and proline tend to be more expanded.3,14−16,18,31−35 Polyelectrolytes dominated by a single type of charge are most expanded,7,8,31 whereas the attraction of opposite charges in polyampholytes can lead to compaction or long-range structural preferences depending on the patterning of oppositely charged residues.7,10,11,36 The use of coarse-grained models parametrized based on experimental results15,37−40 has enabled steps toward the analysis of conformational distributions across entire proteomes.16,25 However, the systematic quantitative assessment of sequence contributions and the parametrization of IDP models is still complicated by the heterogeneity of molecular systems that have been studied experimentally, which usually vary both in length and sequence composition, and are investigated under disparate solution conditions. To furnish a data set that avoids such limitations, we thus selected naturally occurring IDRs of identical lengths but with very different sequence properties and probed their intrachain distances by single-molecule Förster resonance energy transfer (FRET). In selected cases, we used complementary experimental methods, especially small-angle X-ray scattering (SAXS) for quantifying chain dimensions and NMR spectroscopy for identifying residue-specific intrachain interactions. We analyzed the results using atomistic simulations based on the ABSINTH model41 to identify main determinants of chain dimensions and used the data to optimize a coarse-grained IDP model with an explicit representation of the fluorophores.
Results
We selected 16 IDRs, each comprising 57 residues, from the linker regions connecting folded domains in RNA-binding proteins (Figure 1A, Figure S1). The large number of known RNA-binding proteins42 allows for a wide variety of available sequences with very different physicochemical properties, while at the same time ensuring that the selected sequences are biologically relevant in terms of their amino acid composition. The sequence conservation of IDRs in RNA-binding proteins43 attests to their functional importance beyond tethering of the folded domains. Their functions are possibly related to posttranslational modifications, interactions with the folded domains or RNA, or the optimal spacing of domains resulting from the sequence-encoded chain dimensions of the IDRs.44 By scoring the corresponding linker sequences available in UniProtKB45 for average hydrophobicity, net charge, fraction of charged residues, charge patterning, and amino acid composition (Figure S1), we identified examples that maximize the diversity of these properties (Figure 1B–D), from low to high hydrophobicity; from low to high net charge; from polyelectrolytes to polyampholytes; from low to high charge segregation; and including examples enriched in individual amino acids, such as Gly, Glu, Ala, and Asn (see the Materials and Methods section for details). Only sequences with a disorder score in Metapredict46 above 0.5 along the entire sequence were selected (Figure S2). Far-UV circular dichroism spectra of the recombinantly produced IDRs confirm the absence of pronounced secondary structure (Figure S3). The remaining differences between the spectra are suggestive of sequence-specific contributions to the conformational ensembles, but they are difficult to analyze quantitatively. Taken together, we have thus identified a set of naturally occurring IDRs of identical length that cover a broad spectrum of the key parameters commonly used to assess the properties of disordered proteins.
Figure 1.
Sequences and sequence properties of the selected IDRs. (A) Selected sequences of IDRs, with acidic amino acids shown in red, basic residues in blue, Pro residues in brown, and Cys residues used for labeling in bold. The residues terminal of Cys (gray) are a result of the cloning/expression strategy used. Those residues and Cys were not included for calculating sequence parameters. (B–D) Average sequence properties cover a wide range. Hydrophobicity, net charge per residue (NCPR), and fraction of charged residues (FCR) are normalized by the total number of residues in each sequence. (B) The line indicates the separation of IDRs and folded proteins (gray region) suggested by Uversky et al.47 (C) Most selected IDRs fall in the region of weak polyampholytes and polyelectrolytes with FCR ≤ 0.35 and |NCPR| ≤ 0.35. IDRs with FCR > 0.35 and |NCPR| ≤ 0.35 are considered strong polyampholytes, and those with FCR > 0.35 and |NCPR| > 0.35 strong polyelectrolytes19 (dashed lines). The gray region cannot be populated. (D) The charge pattering metric κ describes the distribution of charged amino acids along the chain.10
Chain Dimensions Vary Widely among Sequences
The selected IDR sequences, bracketed with Cys residues for fluorophore labeling via maleimide chemistry, were expressed recombinantly, purified, and labeled with donor and acceptor dyes for single-molecule FRET. By working at picomolar protein concentrations in single-molecule measurements, we could avoid aggregation and phase separation, even for sequences with low solubility that are exceedingly difficult to investigate with ensemble experiments at high concentrations. Moreover, by resolving conformational subpopulations in single-molecule experiments, species such as small aggregates (which may go unnoticed in ensemble measurements) can be detected48 and prevented by optimized sample handling or separated out in the analysis (Figure S4). We performed multiparameter confocal single-molecule measurements using pulsed interleaved excitation49 with all 16 labeled IDRs and identified a single transfer efficiency peak under our experimental conditions for each sequence. The results reveal a remarkably broad range of intramolecular transfer efficiencies from ∼0.4 to ∼0.9 for the different IDRs despite their identical chain lengths (Figure 2A), reflecting the pronounced dependence of the chain dimensions on amino acid sequence.
Figure 2.
Chain dimensions from single-molecule FRET and correlations with sequence parameters. (A, left) Transfer efficiency histograms of Cy3B/CF660R-labeled IDRs (gray) fit with Gaussian peak functions (color). The average of the mean transfer efficiency, ⟨E⟩, from at least three experiments is indicated by a black dashed line, with the standard deviation shown as a gray band. Vertical gray lines indicate the lowest and highest average ⟨E⟩ observed across the series. (A, right) Distance distributions based on the SAW-ν model26 (colored line), reweighted ensembles from ABSINTH41 simulations (gray line), and from an optimized coarse-grained model (black line). Vertical dashed lines indicate the average root-mean-squared distance (R) from SAW-ν, with a gray error band based on a systematic uncertainty of ±7% in the Förster radius.39,53 Values for the average R and average ν are shown in the top right corners. Vertical gray lines indicate the largest and smallest values of R across the series. (B) Correlations between different sequence parameters and average transfer efficiency. The dashed line and the coefficient of determination, ρ2, were obtained from linear regression. (C) Multiple linear regression was used to identify combinations of sequence parameters that maximize the correlation with transfer efficiency. The regression coefficients for each of the sequence parameters used as regressors (%G, κ, and SCD) are indicated. The dashed line is the identity line, and ρc is the concordance correlation coefficient. Color code for the sequences is given in Figure 1.
We had previously observed that the charge of the fluorophores needs to be accounted for to quantitatively explain the dimensions of IDPs with simple polymer models.7,9 We thus used two different FRET pairs with different net charges to assess such effects. One widely used pair comprises the dyes Alexa 488 and Alexa 594 (Förster radius R0 = 5.4 nm), each of which carries a net charge of −2; the other pair comprises the dyes Cy3B and CF660R50 (R0 = 6.0 nm), which carry a net charge of 0 and −1, respectively (Figure S5). We find that protein sequences rich in basic residues yield lower average intramolecular distances when labeled with the more negatively charged Alexa pair, whereas other sequences yield very similar results for both dye pairs (Figure S6A). Similarly, SAXS measurements of the Alexa 488-labeled IDR dCh– showed the increase in Rg expected from the addition of the fluorophore compared to unlabeled dCh–, but for sNh+ the increase was much smaller (Figure S6B). NMR spectroscopy confirmed the presence of more attractive interactions between positively charged residues and the Alexa fluorophores than with Cy3B and CF660R (Figure S6C,D). The large range of transfer efficiencies and chain dimensions we observe is robust with respect to the dye pair used, but to minimize the influence of the dyes on the FRET-based assessment of chain dimensions, we focus on the results obtained with Cy3B/CF660R.
Using the SAW-ν model, a semiempirical approximation with an adjustable length scaling exponent26 to infer intramolecular distance distributions for the different sequences, we obtained root-mean-squared end-to-end distances, R, between 4.0 and 7.3 nm (Figure 2A), corresponding to almost a factor of 2 between the most compact (sGrich) and the most expanded IDR (sNh-), and a factor of ∼6 in chain volume. The inferred average scaling exponents,26 ν, are between 0.47 to 0.61, corresponding to the range from effective theta conditions to excluded volume chains.18,22 Although a detailed interpretation of these scaling exponents is complicated by finite-size effects29 and the contributions from heterogeneous interaction patterns within heteropolymers,12,30 they imply that these IDRs are rather open chains and more expanded than a compact globule. However, it is worth noting that the two Gly-rich sequences sGrich and dGrich are among the most compact of the set, suggesting an important role for Gly in chain compaction.
It may not be surprising that highly charged sequences rank among the most expanded chains,7,8 and correspondingly, the average net charge per residue (NCPR) shows a correlation with the observed transfer efficiency (Figure 2B). Average Kyte–Doolittle hydrophobicity51 shows remarkably little correlation with transfer efficiency, which is likely to be connected to the requirement of alternative hydrophobicity scales to describe protein phase separation.38,40 Similarly, the transfer efficiency shows little correlation with the charge patterning parameter κ, which only applies to sequences with high fractions of charged residues52 (Figure 2B). However, sequence composition clearly influences the chain dimensions; examples of individual residues whose content in the sequences correlates with transfer efficiency with a coefficient of determination of ρ2 ≥ 0.36 are Gly (favoring compaction), Arg (favoring compaction), Val (favoring compaction), Thr (favoring expansion), and Pro (favoring expansion) (Figure S6).
In view of these correlations, we thus asked how well the transfer efficiencies correlate with combined compositional biases. For instance, multiple linear regression combining the fraction of Gly and κ as regressors yields ρ2 = 0.79, i.e., 79% of the variance in the observed transfer efficiencies can be accounted for with this combination alone. Combining the fraction of Gly with κ and sequence charge decoration,11SCD, yields an even higher ρ2 value of 0.93 (Figure 2C). However, based on a leave-one-out analysis to identify the dominant contributions (Figure S7), we find that individual members of the data set have a large effect on the result; for instance, including sGrich greatly improves the ρ2 for linear regressions that account for the fractions of Gly or Tyr. These correlation analyses clearly show an important effect of sequence composition on chain dimensions and can thus provide interesting clues about which residues or sequence characteristics are relevant. However, the set of IDRs investigated here cannot provide sufficiently broad sampling of sequence space to uniquely define chain dimensions based on regression analysis alone. We thus turned to simulations for a more detailed analysis.
Atomic-Level Characterization of Conformational Ensembles from Simulations
Key discoveries regarding sequence-ensemble relationships of IDPs have been made using atomistic simulations based on the ABSINTH implicit solvation model and force field paradigm.41 In the ABSINTH model, polypeptides and solution ions are represented in atomistic detail, and the aqueous solvent is modeled implicitly. Measured free energies of solvation serve as a benchmark against which solvation inhomogeneities are calibrated. These inhomogeneities are gleaned by using solvent-accessible volumes, and the changes to solvation are balanced by changes to the effective charge, which is an efficient way of capturing dielectric inhomogeneities. To compare end-to-end distances from ABSINTH simulations to those inferred from FRET measurements, an atomistic representation of the fluorophores was included based on a rotamer library that takes into account dye configurations that are sterically allowed (see the Materials and Methodssection). To compare FRET efficiencies from simulations and measurements (Figure 3A), we computed the concordance correlation coefficient54 (ρc), which combines correlation (precision) and deviation from perfect concordance (accuracy), yielding ρc = 0.46 for the ABSINTH ensembles.
Figure 3.
ABSINTH simulations provide atomic-level characterizations of conformational ensembles. (A) Correlation between the mean transfer efficiencies, ⟨E⟩, from experiment and unweighted ABSINTH simulations (ρc: concordance correlation coefficient). (B) Kullback–Leibler divergence quantifying the deviations between the unweighted (prior) and reweighted (posterior) ensembles obtained with ABSINTH, plotted as a function of the fraction of Gly and hydrophobic residues minus the fraction of charged residues (FCR) of the sequences. Minimal deviations have a KL divergence of ≤0.1. Based on this cutoff, five of the 16 sequences, characterized by high Gly or high aliphatic content and lower charge content, require substantial reweighting (sPTBP, dTRBP, sGrich, sNrich, and dGrich). (C) Correlation between ⟨E⟩ from experiment and reweighted ABSINTH simulations for sequences with KL divergence <0.1. (D) Average inter-residue distances from reweighted ABSINTH simulations (minimum spacing of 10 amino acids), relative to the value from the best fit of a homopolymer model (see color scale) determined by Ri,j = ⟨ri,j2⟩1/2 = A0|i – j|ν, where ri,j is the spatial distance between residues i and j, A0 is an adjustable prefactor that reports on the chain persistence length, |i – j| is the linear sequence separation between residues i and j, and ν is the scaling exponent. Regions of local expansion relative to the equivalent homopolymer are shown in red, and areas of local compaction are in blue. The insets show ensemble-averaged inter-residue distances, ⟨Ri,j⟩ = ⟨⟨ri,j2⟩1/2⟩ (in Å), versus |i – j| (colored line). The best homopolymer fit is shown as a dashed gray line. Here, the double average implies averaging over all pairs of residues i and j that are |i – j| apart in the sequence (the outer average), and the spatial separations, R, between specific pairs of residues across all conformations in the ensemble (the inner average). Only IDRs with a KL divergence ≤0.1 are shown. (E) Contact networks illustrate different interaction preferences for different IDRs. Residues are shown as nodes, with the circle size related to the mean contact probability between that residue and all other residues more than two residues away in sequence. Edges are drawn between two residues if their contact probability is at least 35% of the maximum contact probability observed for that IDR. Specifically, edges are shown for mean contact probabilities between 0.27 and 0.76, 0.12 and 0.35, and 0.2 and 0.57 for dκh, sCh, and sκl, respectively. The width of an edge is 10 times the mean contact probability. Here, a contact distance of 10 Å is used, such that charge interactions can be observed. Gly is shown in pink, Ser, Thr, Asn, and Gln in green, Arg, Lys, and His in blue, Asp and Glu in red, Phe, Trp, and Tyr in orange, Met, Val, Ile, Leu, and Ala in gray, Pro in purple, and Cys in lime green. Edge colors are mixtures of interacting residue colors. Representative snapshots are visualized using VMD62 and chosen by finding the frame that has the highest weight with a radius of gyration (Rg) within 0.5 Å of the average Rg for the IDR. Contact networks for the remaining proteins are shown in Figure S10.
Overall, ABSINTH captures certain overall trends from the FRET measurements, but there are clear deviations from the experimental results. To better understand where ABSINTH fails and succeeds with the current data set, the conformational ensembles were reweighted, as described previously12 (Figure 3C). We then compared the unweighted (prior) ensembles to the reweighted (posterior) ensembles. This analysis allowed us to assess whether there were specific sequence features that mandate substantial reweighting of the ABSINTH-derived ensembles when comparing the computed and measured FRET efficiencies. The results of this analysis are presented in terms of the Kullback–Leibler (KL) divergence between the unweighted and reweighted ensembles (Figure 3B), which is below 0.06 for 11 of the 16 sequences and greater than 0.1 for five of the sequences. The largest divergences result for the two Gly-rich sequences and the N-rich sequence. The two sequences with large fractions of aliphatic residues also show KL divergences above 0.1. Omitting the sequences with KL divergences above 0.1 from the analysis yields ρc = 0.83 for the unweighted and ρc = 0.99 for the reweighted ensembles (Figure 3C).
The most compact IDR observed experimentally, sGrich, contains several stretches rich in Gly. Water is a poor solvent for polyglycine,20,55 polyglutamine,6 and other types of polar tracts.56 The preference of Gly-57 and Gln-rich sequences58 for collapsed conformations and their low solubility has been predicted and computed using all-atom simulations59 with different types of force fields,60 and observed experimentally.20 The challenges arise in ABSINTH for Gly- and Asn-rich sequences because of the delicate interplay between favorable hydration of the polar backbone and side chains and the favorable intrachain interactions between polar groups, even though ABSINTH does not have a challenge with Gln-rich or Gln- and Asn-rich sequences.61
Before analyzing the reweighted ensembles in detail, we computed three different parameters that quantify the ensemble-averaged global sizes and shapes for each of the IDRs using the ABSINTH-derived prior and posterior ensembles (Figure S8). First, we quantified the correspondence for the global radius of gyration (Rg). Overall, the deviations are minimal, with the two largest outliers being the Gly-rich sequences (Figure S8). Next, we computed how the overall shape changes upon reweighting by computing the ensemble-averaged asphericity (Figure S8). Here, we observed a few more deviations compared to Rg, but the general consistency in compaction and expansion suggests that the sequence controls the local interactions and deviations from a homopolymer. Finally, we compared the root-mean-squared end-to-end distances, R, for unweighted versus reweighted ensembles. We find that the sequences with high KL divergence fall outside the 95% confidence interval. Across the three parameters, the deviation between unweighted and reweighted ensembles is the largest when we compare R.
The preceding analysis suggests that robust inferences regarding conformational ensembles can be drawn from either the reweighted or unweighted ABSINTH simulations for 11 out of the 16 sequences, which we analyze in more detail (Figure 3D). We quantified the scaling of average inter-residue distances (⟨Ri,j⟩ = ⟨⟨ri,j2⟩1/2⟩) between residues i and j with an alternative approach to describing sequence separation (|i – j|) (insets, Figure 3D). To determine how well the conformational ensembles can be described by homopolymer models, the standard polymer relationship ⟨Ri,j⟩ = A0|i – j|ν was fit to extract A0 and ν, the apparent scaling exponent (insets in Figure 3D, dashed gray line). Although the value of ν is not meaningful for internal scaling profiles that show plateauing behavior, this type of comparison can still be helpful to determine whether there are nonuniform interactions along a chain. Therefore, we examined the distance between residues normalized by the best-fit homopolymer models (Figure 3D). All IDRs show deviations from the uniform length scaling expected for homopolymers, as shown by regions along the sequence of compaction (blue) or expansion (red) compared to the IDR-specific homopolymer model. Additionally, the degree of compaction or expansion relative to an equivalent homopolymer reference can vary along the sequence (Figure 3D, Figure S9), highlighting the heterogeneity of interactions within each sequence. Overall, these results suggest that even though global features, such as end-to-end distance, can be correlated with sequence composition, additional analyses from atomistic simulations can provide detailed insights into the heteropolymeric properties of the IDRs.
Optimizing a Coarse-Grained IDP Model with Explicit Fluorophores
An alternative approach to describing sequence-ensemble relationships is to use the experimental data to variationally optimize the parameters of the simulation model itself.15,39 Coarse-grained implicit solvent models, which have only a limited set of adjustable parameters, are naturally suited to this approach.15,16,37,39,40 In this case, the amino acids are represented by beads with the appropriate volumes and charges, whose interactions are accounted for via a screened Coulomb potential and a residue-specific short-range potential that represents interactions such as hydrophobicity or hydrogen bonding in addition to the excluded volume. Force field parameters for such models have previously been optimized based on statistical potentials and/or by comparison with experimental data.37,38,40 However, in general, the experimental data employed have been collected under different solution conditions (e.g., different temperatures, pHs, salt concentrations), complicating their coherent use for model refinement. Data sets comprising a large sequence diversity of IDRs measured under identical solution conditions are therefore expected to be useful for benchmarking and refining simulation models and parameters that are transferable to a wide range of IDPs.
We thus employed the experimental results from our 16 IDRs to identify the residue-specific short-range interaction parameters for a hydrophobicity scale (HPS) model38,63 that best describe the entire data set. To this end, we use the values of Tesei et al.40 as a starting point and employ the force balance approach,38,64,65 where the short-range interaction parameters λ are iterated to optimize agreement between the simulated and experimental transfer efficiencies. A particular advantage of this method is that the fluorophores and their interactions with the rest of the sequence can be included in the model explicitly and parametrized with the same strategy. This approach thus enables interactions of the fluorophores to be taken into account that go beyond the excluded volume effects most commonly considered in accessible volume66,67 and rotamer library50,68−71 schemes. We chose a dye representation that reflects the size, structure, and charge distribution of the different fluorophores and consists of charged, uncharged, and dye linker beads (Figure S5).
With the optimized parameters for the HPS model (Figure 4A), we obtained correlations with ρc = 0.91 between experiment and simulation for the IDRs labeled with either dye pair (Figure 4B), compared with ρc = 0.70 and 0.64 before optimization. The short-range interaction parameters for the amino acids after optimization are reasonably close to the starting values, which were obtained previously based on experimental data and hydrophobicity scales of amino acids;40 indeed, the new parameters yield similar results when benchmarked against the original CALVADOS training set40 (Figure S11). The increased value for Gly reflects the pronounced compaction of Gly-rich sequences we observe experimentally, and the increased values for Arg and Tyr are in line with previous results suggesting an important role of these residues for chain compaction and phase separation.16,40,73−76 We note, however, that while the CALVADOS parameters slightly underestimate compaction for Gly-rich sequences in the present data set, our HPS parameters slightly overestimate compaction for the longer Gly-rich sequences in the CALVADOS training set, a conflict that likely points to limitations of the form of the HPS model itself. Importantly, by essentially treating the fluorophores as part of the sequence, both the results for the Alexa dye pair and Cy3B/CF660R can be described well. Although a detailed rationalization of the resulting short-range interaction parameters for the fluorophores based on their chemical structure is challenging at this level of coarse-graining, the larger values for the Alexa dyes are in accord with the stronger dye-peptide interactions observed in the SAXS and NMR data compared to Cy3B/CF660R (Figure S6).
Figure 4.
Coarse-grained simulations consistently describe amino acid and dye interactions. (A) Initial (orange for amino acids and light orange for dyes), optimized (blue), CALVADOS 272 (gray) values of the short-range interaction parameter (λ) for each amino acid, dye (Alexa 488/594 and Cy3B/CF660R; Neg: negatively charged, Neu: neutral, Pos: positively charged), and dye linker (Lin) beads. Note that the initial values for Ala and Asp are too small to be visible on this scale (Table S3). (B) Correlation between the mean transfer efficiencies from experimental data and from resulting coarse-grained simulations for both dye pairs used. The dashed line is the identity line, and ρc is the concordance correlation coefficient. (C) SAXS curves of selected IDRs (gray lines) compared to results based on the simulations (colored lines). For each IDR, results for unlabeled protein are shown in the left plot, and results for protein labeled with Alexa 488 at both Cys residues are shown in the right plot. The simulation value at a q of 0 was used for normalization of both experiments and simulations.
We further tested the optimized HPS parameters by calculating scattering curves from coarse-grained simulations of several IDRs with and without fluorophores and comparing with SAXS measurements of labeled and unlabeled sNh+, dCh–, sNrich, and dTRBP, and we found reasonable agreement (Figure 4C), further validating the model. As additional tests on independent sequences, we used previously published SAXS data on a series of IDPs12 and a mutationally destabilized variant of the β-helix protein PNt labeled with zero, one, or two copies of Alexa 488;77 overall, the data are well described by our model. In the case of PNt, the results indicate a very moderate decrease in radius of gyration upon attaching one or two dyes (although limitations in SAXS data quality yield a large uncertainty for the double-labeled variant, Figure S12A). For the same PNt variant with Alexa 488 and 594 attached (at the same sites as for the PNt variant doubly labeled with Alexa 48877), the measured transfer efficiency (⟨E⟩ = 0.6; Figure S12B) is reasonably reproduced by the HPS model (⟨E⟩ = 0.52). Altogether, the optimized coarse-grained model thus provides a residue-specific way of quantifying the dimensions of disordered proteins. Moreover, the fluorophores can be incorporated and their interactions parametrized within the same framework, treating them essentially as an additional set of residues, so that the effect of the FRET dyes on chain dimensions can be predicted and distances and distance distributions between the dyes can be obtained directly from the simulations and compared to experiment.
Finally, we compared end-to-end distance distributions resulting from the three different methods employed: the analytical SAW-ν polymer model, the optimized coarse-grained HPS model, and the reweighted atomistic ABSINTH simulations (Figure 2A). The distributions are similar in all cases, indicating the consistency of the different approaches at the level of the overall chain dimensions. For a more detailed comparison, we calculated distance maps from the HPS simulations and the reweighted ABSINTH simulations (Figure S13). This analysis reveals three groups of sequences. For nine of the 16 sequences, we find strong positive correlations between the normalized distance maps for the reweighted ABSINTH and HPS models (Pearson correlation coefficient ρ > 0.6); for four of the 16 sequences, weak positive correlations (0.3 < ρ < 0.5); and for three of the sequences—sGrich, sPTBP, and sNrich—we observe anticorrelation (ρ < 0). These are also the IDRs for which extensive reweighting was required for the ABSINTH ensembles (Figure 3B), but for most of the IDRs, the intrachain interactions predicted by the models are similar. The discrepant cases highlight the challenges associated with the interplay between chain-solvent and intrachain interactions in arriving at a consistent description of ensembles for Gly- and Asn-rich sequences,61 and for sequences where secondary structural preferences in the ABSINTH model cause deviations, such as sPTBP. Indeed, the circular dichroism spectra of sPTBP and some other sequences (Figure S3) show hints of residual secondary structure.
Discussion
We investigated a set of 16 IDRs selected from linker sequences of naturally occurring proteins with identical lengths but very different sequence compositions to probe the sequence dependence of the conformational ensembles of disordered proteins. Notably, since all sequences investigated here originate from the linker regions between RNA-binding domains, their chain dimensions may have been an evolutionary factor contributing to the average distance between the domains44 and their interaction with RNA. The experimental results, consisting of single-molecule FRET efficiencies measured with two different dye pairs and complemented with SAXS and NMR, serve as a benchmark and provide an opportunity for systematically refining simulation models and force field parameters. Here, we tested and compared three approaches at very different levels of coarse-graining for modeling the conformational ensembles of 16 IDRs.
Analytical homopolymer models can be useful for inferring overall distance distributions and effective length scaling exponents; although they cannot provide details on heterogeneities in local compaction or expansion along the sequence, they are a simple and useful way of interpreting experimental data in terms of distance distributions,17,18 but their predictive power is limited. Simple correlations between chain dimensions and sequence composition also offer useful indications of the effect of individual residues on chain compaction, but simulations provide much more detail regarding the heteropolymer properties of disordered proteins. Our application of two different simulation approaches demonstrates the complementarity of atomistic and coarse-grained models. All-atom implicit solvent simulations using ABSINTH41,61,78 in combination with ensemble reweighting12 enable detailed analyses of residue-specific intrachain interaction networks that affect chain compaction and the resulting deviations from simple homopolymer models (Figure 3). Coarse-grained models facilitate the optimization of force field parameters to arrive at a transferable model, illustrated here with the HPS model38,40,63 combined with the force balance approach38,64,65 (Figure 4). We further show how FRET dyes can be incorporated and parametrized explicitly in the HPS model to achieve agreement between results using different fluorophores (Figure 4A,B). Overall, our work thus illustrates the mutual benefit of experiment and simulations: experimental data enable the testing and refinement of simulation models, and simulations enable a detailed structural interpretation of the experimental results.
Conclusions
Our results demonstrate that the dimensions of IDRs exhibit a pronounced dependence on amino acid sequence. This result is consistent with a broad range of previous observations,7−10,13,14,16,18,19,24,31,34,79−81 but a noteworthy feature of the current work is that we focused on sequences of identical length, thus ensuring that differences in sequence-ensemble relationships do not arise from additional structure or sequence context. The measurements were performed under identical solution conditions, such as buffer, salt concentration, and pH, which greatly simplifies their direct quantitative comparison. Furthermore, the sequences were selected from natural proteins to ensure the biological relevance of their sequence compositions and to represent a broad range of sequence characteristics, which allows many types of effects to be accounted for. The analysis of the results using polymer models and both all-atom and coarse-grained simulations consistently shows that the chain dimensions and conformational properties cover a very broad range. Particularly pronounced contributions to chain compaction come from the content in Gly and aromatic residues; contributions to chain expansion come from charge repulsion and Pro residues. However, we did not identify a simple single descriptor that captures global chain dimensions, but compaction can be driven by different types of interactions in different sequences. From our data, we have derived a coarse-grained model that can be used to predict how such interactions affect the dimensions of other disordered proteins.
Materials and Methods
Sequence Selection and Characterization
To identify disordered protein regions, UniProtKB was searched for proteins containing at least two double-stranded RNA-binding domains,82 and sequences of interdomain linkers were identified that were 50–200 amino acids in length. In order to increase the compositional diversity of the sequences, a second pool of sequences was generated from proteins containing at least two RNA recognition motifs (RRMs).83 All sequences were characterized in terms of the following sequence properties. Normalized hydrophobicity was calculated using the scale of Kyte and Doolittle, which assigns a relative hydrophobicity index, Hi, between −4.5 and +4.5 to each amino acid51
![]() |
Here, N is the total number of amino acids in the sequence and ni is the number of each of the 20 amino acid types within the polypeptide chain. The normalized hydrophobicity may adopt values between 0 and 1. The fraction of charged residues, FCR, and the net charge per residue, NCPR, were calculated according to Das and Pappu10 as
![]() |
![]() |
where f+ and f– denote the fractions of positively and negatively charged residues, respectively. We calculate the charge patterning factor κ and sequence charge decoration11 (SCD) as described previously. A total of 16 linker sequences were selected to cover a large sequence parameter space. All sequences were shortened to 57 amino acids to rule out length-dependent effects in their comparison. In this process, care was taken to alter the average sequence properties as little as possible. Sequences containing Trp were excluded to minimize effects from dye quenching that can complicate quantitative analysis of FRET experiments.84 Two natural Cys residues were replaced by Ser in dErich, and sκl contains a spontaneous Ser to Ile exchange due to the instability of the gene in Escherichia coli. The naming of the IDRs was chosen to be suggestive of characteristic sequence properties (‘s’: derived from ssRNA binding proteins, ‘d’: derived from dsRNA binding proteins, ‘h’: high, ‘l’: low, ‘N’: net charge, ‘C’: charge, ‘+’: positively charged, ‘–’: negatively charged, “κ”: charge segregation, “Xrich”: enriched in amino acid X). All sequences are shown in Figure 1, and the UniProt codes of the source proteins and the sequence parameters are listed in Table S1.
Multiple linear regression was performed for all single, double, and triple combinations of the following 26 compositional features: Fraction A, D, E, G, K, L, N, P, Q, R, S, T, V, K+R, D+E, Polar, Aliphatic, Aromatic, and Chain Expanding, as well as FCR, |NCPR|, Hydrophobicity, Disorder Promoting, Isoelectric point, κ, and SCD. Compositional features were calculated using localCIDER.85 Fractions of C, F, H, I, M, W, and Y were not considered, as they each account for less than 2.5% of all residues in the linker IDR sequences. For Figure S7A, the ρ2 values are shown for all 26 single compositional features, all double combinations of compositional features with a ρ2 > 0.72, and all triple combinations of compositional features with a ρ2 > 0.855. The boxplots in Figure S7A show the distributions of ρ2 values for all 16 leave-one-out analyses.
Protein Expression, Purification, and Fluorescence Labeling
Codon-optimized DNA sequences encoding the IDRs with two terminal Cys residues for site-specific dye labeling were purchased from GeneArt (Regensburg, Germany). Linker IDR sequences were cloned into a pET-20b(+) based plasmid (EMD Millipore), which contained an N-terminal His6-tag, as well as a C-terminal GB1 domain for improved expression fused to a His6-tag, both of which were separated from the IDR of interest via a thrombin cleavage site.86 For all constructs, thrombin cleavage resulted in a residual GSGSC overhang at the N-terminus and a CTLGPR overhang at the C-terminus of the protein.
The IDRs were expressed in E. coli Rosetta (DE3) cells (Merck Biosciences). Cultures were grown to an OD600 of 0.8 in LB medium containing carbenicillin, induced with 1 mM IPTG, and incubated at 20 °C overnight. For the preparation of isotope-labeled proteins, M9 minimal medium containing 15NH4Cl or 15NH4Cl and 13C6-glucose was used instead of LB medium. Cells were harvested, and pellets were resuspended in lysis buffer (100 mM NaH2PO4/Na2HPO4, 10 mM Tris-HCl, 6 M guanidinium chloride (GdmCl), 10 mM imidazole, 1 mM Tris(2-carboxyethyl)phosphine (TCEP), pH 8.0). Insoluble cell debris was removed by centrifugation. The soluble fraction was subjected to Nickel chelate affinity chromatography (Ni Sepharose excel, GE Healthcare Bio-Sciences). The lysis buffer was used for washing. For elution, the imidazole concentration was increased to 500 mM. Eluates were dialyzed against 50 mM Tris-HCl, 150 mM NaCl, and 10% (v/v) glycerol, pH 8.0, and the total protein concentration was quantified by measuring the absorbance at 280 nm (extinction coefficients: sGrich-GB1, 20,400 M–1 cm–1; dGrich-GB1, 12,950 M–1 cm–1; dκh-GB1, 11,460 M–1 cm–1; all others: 9970 M–1 cm–1). Subsequently, thrombin was added at 20 U/mg and the proteolytic digest was allowed to proceed for 1–3 h at room temperature. The reaction was quenched by adding 1 g/mL GdmCl. Protein solutions were then concentrated to a total volume of approximately 1 mL using Centriprep 3K centrifugal filter devices (EMD Millipore).
Protein samples were reduced by adding DTT at a final concentration of 10 mM and purified by reversed-phase high-performance liquid chromatography (RP-HPLC) on a C18 column (Reprosil Gold 200, Dr. Maisch GmbH) using 5% acetonitrile, 0.1% (v/v) trifluoroacetic acid (TFA) as buffer A and acetonitrile as buffer B. Eluates were lyophilized overnight and redissolved in 20 mM KH2PO4/K2HPO4, 6 M GdmCl, pH 7.3. Protein concentrations were quantified using a bicinchoninic acid (BCA) assay kit (Thermo Fisher Scientific Inc.) by measuring the absorbance at 562 nm. For fluorescence labeling, Alexa Fluor 488 C5 maleimide (Thermo Fisher Scientific Inc.) or maleimide-functionalized Cy3B (GE Healthcare AG) dissolved in anhydrous N,N-dimethylformamide (DMF) was added at a molar ratio of protein to dye of 1:0.7. The reaction was allowed to proceed at 4 °C overnight and quenched by the addition of DTT at a final concentration of 10 mM. Singly labeled protein was separated from unreacted and doubly labeled protein by RP-HPLC (see above), followed by lyophilization. Donor-labeled protein was redissolved in 20 mM KH2PO4/K2HPO4, 6 M GdmCl, pH 7.3. Alexa Fluor 594 C5 maleimide (Thermo Fisher Scientific, Inc.) or maleimide-functionalized CF660R (Biotium, Inc.) dissolved in anhydrous DMF was added in 3 times molar excess. The reaction was permitted to proceed as described above and quenched by adding DTT at a final concentration of 10 mM. Donor–acceptor labeled protein was purified by RP-HPLC (see above), lyophilized, and redissolved in 20 mM KH2PO4/K2HPO4, 6 M GdmCl, pH 7.3. Protein identity and site-specific labeling were confirmed by electrospray ionization mass spectrometry (ESI-MS). The reactivity of the two cysteine residues for the fluorophores is not identical, and some separation of labeling permutations was achieved during the purification of some IDRs, but in most cases, we used a mixture of permutants. In the case of sκh, where the dye permutants could be separated, the difference in their transfer efficiency was 0.02, indicating a minor effect on the results. Fluorescently labeled samples were stored at −80 °C until further use.
A synthetic codon-optimized N-terminal 334-amino acid segment of pertactin77 (PNt) with Cys residues in positions 29 and 117 was cloned into a pJ414 vector (ATUM) and transformed into E. coli BL21-DE3 cells (Agilent). Cells were grown at 37 °C in Luria–Bertani medium containing 100 μg mL–1 carbenicillin and induced for expression at an OD600 of 0.7 for 3 h. The cell pellet derived from 0.5 L of culture was suspended in 70 mL of buffer A [50 mM Tris-HCl, pH 8, 100 mM NaCl, 1 mM ethylenediaminetetraacetic acid (EDTA), 5 mM 2-mercaptoethanol, and 5 mM benzamidine], followed by the addition of lysozyme (100 μg mL–1) and sonicated at 4 °C. The insoluble recombinant protein was washed by resuspension in 70 mL of buffer A containing 1% Triton X-100 and subsequently in buffer A in the absence of Triton X-100. In all cases, the insoluble fraction was pelleted by centrifugation at 20,000g for 30 min at 4 °C. The final pellet was solubilized in 8 M urea, 50 mM Tris-HCl, pH 8.0, 5 mM EDTA, and 5 mM tris(2-carboxyethyl)phosphine (TCEP) and 1/4th of the protein (∼15 mg) was applied onto a Superdex-200 column (1.6 × 60 cm, Cytiva) equilibrated in 50 mM Tris-HCl, pH 8, 4 M GdmCl, 1 mM EDTA, and 1 mM TCEP at a flow rate of 1.4 mL min–1 at ambient temperature. Peak fractions with the highest purity were verified by mass spectrometry and used for the experiments. PNtCC was labeled with dyes as described for the other IDRs.
Circular Dichroism Spectroscopy
Unlabeled constructs were dialyzed against 20 mM KH2PO4/K2HPO4, 1 mM DTT, pH 7.3, using Slide-A-Lyzer MINI Dialysis Devices, 3.5K MWCO (Thermo Fisher Scientific). Insoluble components were removed by centrifugation. Circular dichroism spectra from 190 to 250 nm were acquired on a spectropolarimeter (J-810, Jasco, or ChiraScan V100, Applied Photophysics) at 22 °C in quartz cells with a path length of 0.5 or 1 mm at concentrations of 0.1–0.5 mg/mL. Absorption data of those scans were used to determine the concentration of the peptides using their absorption at 214 nm.87
Single-Molecule Spectroscopy
For single-molecule experiments, the donor–acceptor labeled IDRs were diluted to approximately 100 pM in 20 mM KH2PO4/K2HPO4, 125 mM KCl, pH 7.3 with 0.001% Tween 20, and 10 mM DTT for the Cy3B/CF660R-labeled or 147 mM 2-mercaptoethanol for the Alexa-labeled IDRs. The measurements were conducted at 22 °C using chambered cover slides (μ-Slide, ibidi). Different light sources were used for excitation, depending on the fluorophores used. For Alexa 488 excitation, an LDH-D-C-485 diode laser (PicoQuant GmbH) was employed. Alexa 594 and Cy3B excitation was achieved using a supercontinuum fiber laser (SC-450-4, Fianium Ltd.) filtered by a z582/15 or HC543.5/2 band-pass, respectively (Chroma Technology). CF660R was excited with an LDH-D-C-640 diode laser (PicoQuant GmbH). Lasers were operated at a pulse repetition rate of 20 MHz to achieve pulsed interleaved excitation of donor and acceptor.88 Fluorescence photons were collected with a UplanApo 60x/1.20W objective (Olympus) and passed through a suitable multiband mirror and a 100-μm confocal pinhole. Subsequently, photons were separated according to polarization by using a polarizing beam splitter and wavelength via suitable dichroic mirrors. Finally, photons were filtered by optical band-pass filters and detected by avalanche photodiodes. Photon arrival times were recorded with a HydraHarp 400 time-correlated single-photon counting system (PicoQuant) at a time resolution of 16 ps.
Photon bursts emitted by labeled IDRs diffusing through the confocal volume were identified as contiguous intervals of emission with interphoton times below 150 μs.89 FRET efficiency histograms shown in Figure 2A are based on a threshold of 50 photons per burst. Dual-channel-burst-search90 was applied to avoid artifacts from bleaching and blinking. Subsequently, bursts were corrected for differences in chromophore quantum yields, differences in detection efficiency of the detectors and spectral crosstalk obtained from measurements of free dye solutions, and direct acceptor excitation and background signal.91 The stoichiometry ratio88,92 of a photon burst was calculated according to
![]() |
where ntot,Dex and ntot,Aex denote the corrected total number of photons emitted after donor or acceptor excitation, respectively. Bursts with 0.2 < S < 0.8 were used to calculate the transfer efficiency
![]() |
where nD and nA are the corrected donor and acceptor photon counts emitted upon donor excitation within a burst, respectively. Alternatively, the correction factors were inferred from the measurement of the IDRs with alternating excitation.39,92Figure 2 and Table S2 show the average of the mean transfer efficiencies of at least three independent measurements. The average standard deviations are ±0.014 or ±0.024 for the measurements using either the Alexa dye pair or Cy3B/CF660R, respectively, most of which were taken over the course of multiple years and on different instruments. Fluorescence polarization anisotropies were <0.1 for all samples and fluorophores, indicating that the orientational factor κ2 in Förster theory can be approximated by 2/3 due to rapid orientational averaging of donor and acceptor.53 Data analysis was performed using the Mathematica (Wolfram Research) package Fretica (https://github.com/SchulerLab).
NMR Spectroscopy
All data were acquired on Bruker Avance 600 MHz spectrometers equipped with TCI triple-resonance cryogenic probes and pulsed-field gradient units. All spectra were referenced directly by using DSS for the 1H dimension; 13C and 15N frequencies were referenced indirectly. Samples were dissolved in a buffer identical to those used for smFRET measurements (20 mM KH2PO4/K2HPO4, 0.125 M KCl, 10 mM DTT, pH 7.3). For backbone assignment, the 15N, 13C isotopically labeled peptides were prepared to an approximate concentration of 0.5 mM. Standard 3D assignment experiments based on sensitivity-enhanced 1H, 15N HSQC (8 scans, 1024 × 256 complex data points) were collected. These included an HNCACB and CBCA(CO)NH (8 scans, 1024 (1H) × 32 (15N) × 128 (13C) complex data points, with 11, 24, and 70 ppm as 1H, 15N and 13C sweep width, respectively), an HN(CA)CO (8 scans, 1024 (1H) × 32 (15N) × 75 (13C) complex data points, with 11, 24, and 18 ppm as 1H, 15N, and 13C sweep widths), an HNCO (16 scans, 1024 (1H) × 32 (15N) × 75 (13C) complex data points, with 11, 32, and 22 ppm as 1H, 15N, and 13C sweep widths), and a HNCA (16 scans, 1024 (1H) × 32 (15N) × 95 (13C) complex data points, with 16, 25, and 30 ppm as 1H, 15N, and 13C sweep widths). Additionally, HN(CA)NNH (16 scans, 1024 (1H) × 32 (15N F1) × 60 (15N F2) complex data points, with 11, 24, and 24 ppm as 1H, 15N F1, and 15N F2 sweep widths) spectra provided connectivity between i and i ± 1 amide nitrogen nuclei. Data were processed using BRUKER Topspin version 3.4, NMRPipe93 (v.7.9) and analyzed using NMRfam SPARKY.94 HSQC spectra were acquired for the unlabeled and Cy3B-, CF660R-, Alexa 488-, and Alexa 594- labeled sNh+ (16 scans, 1024 × 256 complex data points) at an approximate concentration of 0.1 mM. Assignments were transferred to labeled sNh+ based on the assignments of the unlabeled protein. Broadening of the HSQC resonances was quantified using the ratio of the peak height of labeled protein to that of unlabeled. Chemical shift perturbation values were calculated as
![]() |
Small-Angle X-ray Scattering
SAXS experiments were performed on the BioCAT (18-ID-D) beamline at the Advanced Photon Source at Argonne National Laboratory.95 Linker IDR samples were measured by coupling size-exclusion chromatography to a coflow X-ray sample chamber.96 In short, a 5/150 Superdex 75 increase column (Cytiva) was equilibrated in a buffer containing 20 mM KH2PO4/K2HPO4, 0.125 M KCl, pH 7.3, and 10 mM DTT. Elution of protein from the column was monitored by UV absorbance at 220 nm and integrated X-ray scattering intensity. Data reduction was performed at the beamline using the BioXTAS RAW software package.97 Subsequent analysis and averaging of SEC-SAXS data was performed using custom Matlab routines98 (Mathworks).
Atomistic Simulations and Reweighting
Atomistic simulations of each of the IDRs were performed utilizing a homegrown adaptation of version 3 of the CAMPARI Monte Carlo simulation package (http://campari.sourceforge.net) and ABSINTH implicit solvation model and force field paradigm.41,78 For each sequence, five independent simulations were performed. The simulations use spherical droplets with radii of 150 Å. Simulations utilize a modified abs3.2_opls.prm parameter with explicit representations of ions,99 and the radii of sodium ions were set to 1.81 Å to avoid broken ergodicity due to ion chelation effects, especially around acidic groups. Neutralizing and excess Na+ and Cl– ions were modeled explicitly, with an excess NaCl concentration of 20 mM. Simulations were performed at 340 K with 6.15 × 107 steps, of which the first 1 × 107 steps were taken as equilibration. The move set included translational, side chain rotation, concerted rotation, pivot, and proline puckering moves.78
For each replica, 1030 frames were saved and subjected to the addition of dyes using our in-house program COCOFRET. Briefly, for each frame 50 trials were attempted to attach Alexa 488 on the first Cys and attempts were discarded if the dye leads to steric classes with the IDR. Additionally, 50 separate trials were attempted to attach Alexa 594 to the second Cys and attempts were discarded if steric classes with the IDR exist. Attachment of dyes was performed by randomly selecting a rotamer from the HandyFRET rotamer library (http://karri.anu.edu.au/handy/rl.html) and making sure the γ-sulfur angles and bond lengths were ideal. Clashes were defined as any atoms within 5 Å of each other. Then, if at least 20 Alexa 488 and 20 Alexa 594 dyes were attached successfully, then all Alexa 488 and Alexa 594 dyes were attempted to be combined for the given frame conformation. If the dyes did not lead to steric clashes, then the distance between the dyes was saved. Transfer efficiencies per distance were determined using the Förster formula with R0 = 6 nm. For each frame, the mean transfer efficiency was calculated and used for the reweighting procedure.
The maximum entropy method COPER was utilized to reweight simulation ensembles to match experimental mean transfer efficiencies.100 Briefly, the experimental mean transfer efficiencies as well as their associated errors listed in Table S2 were used as inputs to generate weights per frame that yield a global solution satisfying the inputs. The generated weights were then used to extract quantify conformational properties from the simulated ensembles.
Analyses of ABSINTH Simulations
All analyses were performed using the Python-based simulation analysis package SOURSOP.101 The weights extracted from COPER were used as inputs for the various analysis routines performed. Internal scaling profiles were calculated using the get_internal_scaling_RMS() analysis routine. The get_scaling_exponent() analysis routine was used to extract the best estimates of A0, the prefactor which reports on the chain persistence length, and scaling exponent, ν, from the standard homopolymer relationship, ⟨Ri,j⟩ = ⟨⟨ri,j2⟩1/2⟩ = A0 |i – j|ν for each simulated ensemble. The A0 and ν values extracted were then used as inputs into the analysis routine get_polymer_scaled_distance_map(). This routine determines how all residue distances compare to the best-fit standard homopolymer scaling behavior. The mode “scaled” was used which divides each weighted distance by the best-fit homopolymer model distance. Contact information was extracted using the analysis routine get_contact_map() with the mode “closest-heavy” and a contact distance threshold of 10 Å. The radius of gyration, asphericity, and secondary structure information per frame were calculated using the get_radius_of_gyration(), get_asphericity(), and get_secondary_structure_DSSP() analysis routines, respectively. Contact networks were generated using the Python package NetworkX. When contacts (nodes in Figure 3E and Figure S10) and distance averages (Figure S10) were extracted per residue (i), averages were taken only over residues greater than two residues away in linear sequence space, i.e., j > i + 2 and j < i – 2.
Hydrophobicity Scale (HPS) Model Optimization and Simulations
We used the hydrophobicity scale model representation of disordered proteins,63 in which each residue is represented by a single bead with size based on average residue volumes in crystal structures, linked by harmonic bonds with equilibrium length 0.38 nm and spring constant 481.4 kJ nm−2 mol−1. Interactions of each bead are determined by a scalar parameter λ characterizing “stickiness” with other beads. The value λ was based on hydrophobicity scales in the original model but should not be literally interpreted as hydrophobicity. Pairwise interactions between the beads are described by a modified Weeks–Chandler–Anderson potential in which the attractive part is determined by the arithmetic mean of the λ-values of the two beads. Further details are as described by Dannenhoffer-Lafage and Best.38 The dyes were represented as shown in Figure S5, and force field parameters are given in Table S3. The dyes are linked by harmonic bonds with an equilibrium length of 0.38 nm and a spring constant of 481.4 kJ nm−2 mol−1. The shapes of the dyes are maintained by harmonic angle potential spring constants of 48.14 kJ rad−2 mol−1. The equilibrium angles for branch points were π/2 and π elsewhere. Note that harmonic potentials for bond angles were only applied to the colored beads in Figure S5 and not to the dye linker beads. The mass of each dye bead was set to 100 atomic mass units.
Langevin dynamics was propagated at a temperature of 300 K with a time step of 10 fs and a friction coefficient of 1.0 ps–1 using the LAMMPS package, with typical run lengths of 600 ns for each protein, discarding the first 100 ns for equilibration. Run input files and final parameters are available on Zenodo at 10.5281/zenodo.11397637. Force balance optimization was performed as described by Dannenhoffer-Lafage and Best,38 optimizing the FRET efficiency and using a learning rate of 5.0. The initial parameters for the protein were obtained from the M1 parameter set of Tesei et al.,40 while those for the dyes were obtained from a grid search over dye and linker λ values. Furthermore, L2 starting point regularization was employed with initial parameters used as the starting point and a regularization strength of 0.0001. Simulations were performed at an ionic strength of 185 mM, with CF660R or Alexa 594 attached to Cys5 and Cy3B or Alexa 488 attached to Cys64. Permuting the dyes led to an average difference in ⟨E⟩ of 0.018, which was thus not taken into account in our comparison. Benchmark simulations of the CALVADOS data set40 were performed with the CALVADOS M1 parameters and with our optimized “Linker-HPS” parameters (Figure S11).
Simulations of PNt, ibb, n49, nul, and nus were propagated at 298 K with a time step of 10 fs, a friction coefficient of 1.0 ps–1, and a Debye length corresponding to an ionic strength of 190 mM for PNt and 165 mM for ibb, n49, nul, and nus. PNt was simulated for 60 μs, and the first 10 μs was discarded for equilibration. The simulations of ibb, n49, nul, and nus were propagated for 10 μs, and the first microsecond was discarded for equilibration.
Acknowledgments
We thank Lucia Franchini, Daniela Oswald, Gessica Priola, Moa Hasler, Vincent Maximilian Münch, Steffen Winkler, and Dina Heiligensetzer for their assistance in sample preparation, single-molecule measurements, and circular dichroism spectroscopy measurements, and Soundhar Gopi and Miloš Ivanović for helpful discussion. We thank the Functional Genomics Center Zurich for mass spectrometry analysis. This work was supported by the Swiss National Science Foundation (grant 310030_197776, B.S.), the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie grant agreement ID no. 898228 (A.C.), the US National Institutes of Health (grant R01NS121114, T.M. and R.V.P.), the US National Science Foundation (grant MCB-2227268, R.V.P.), the American Lebanese Syrian Associated Charities (to T.M.), and by the Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health (to T.D.-L, J.M.L., and R.B.B.). This research used resources of the Advanced Photon Source, which is a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357. BioCAT was supported by grant P30 GM138395 from the National Institute of General Medical Sciences of the National Institutes of Health. In addition, the computational resources of the NIH HPC Biowulf cluster were utilized (http://hpc.nih.gov).
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/jacsau.4c00673.
Selection scheme, disorder scores, and circular dichroism spectra for the 16 IDR sequences, single-molecule analysis of sκh labeled with Cy3B/CF660R, coarse-grained dye structures, SAXS and NMR data, correlations between different sequence parameters and average transfer efficiency, and additional ABSINTH and HPS model analyses (Figures S1−S13); summary of UniProt11 identifiers and important physicochemical parameters of the IDRs, original (CALVADOS 112 M3), optimized, and CALVADOS 213 short-range interaction parameters (λ) (Tables S1−S3) (PDF)
Author Present Address
∇ Federal Food Safety and Veterinary Office, 3003 Bern, Switzerland
Author Contributions
CRediT: Andrea Holla conceptualization, formal analysis, investigation, visualization, writing - original draft, writing - review & editing; Erik Martin formal analysis, investigation, visualization, writing - review & editing; Thomas Dannenhoffer-Lafage formal analysis, investigation, methodology, visualization, writing - review & editing; Kiersten M. Ruff formal analysis, investigation, visualization, writing - review & editing; Sebastian L. B. König formal analysis, investigation; Mark F. Nüesch formal analysis, investigation; Aritra Chowdhury formal analysis, investigation, writing - review & editing; John M. Louis investigation, writing - review & editing; Andrea Soranno conceptualization, formal analysis, investigation, writing - review & editing; Daniel Nettels formal analysis, software, writing - review & editing; Rohit V Pappu conceptualization, funding acquisition, resources, supervision, writing - review & editing; Robert B. Best conceptualization, formal analysis, investigation, methodology, resources, supervision, writing - review & editing; Tanja Mittag conceptualization, funding acquisition, resources, supervision, writing - review & editing; Benjamin Schuler conceptualization, formal analysis, funding acquisition, resources, supervision, writing - original draft, writing - review & editing.
The authors declare no competing financial interest.
Supplementary Material
References
- Holehouse A. S.; Kragelund B. B. The molecular basis for cellular function of intrinsically disordered protein regions. Nat. Rev. Mol. Cell Biol. 2024, 25, 187–211. 10.1038/s41580-023-00673-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsang B.; Pritisanac I.; Scherer S. W.; Moses A. M.; Forman-Kay J. D. Phase Separation as a Missing Mechanism for Interpretation of Disease Mutations. Cell 2020, 183, 1742–1756. 10.1016/j.cell.2020.11.050. [DOI] [PubMed] [Google Scholar]
- van der Lee R.; Buljan M.; Lang B.; Weatheritt R. J.; Daughdrill G. W.; Dunker A. K.; Fuxreiter M.; Gough J.; Gsponer J.; Jones D. T.; Kim P. M.; Kriwacki R. W.; Oldfield C. J.; Pappu R. V.; Tompa P.; Uversky V. N.; Wright P. E.; Babu M. M. Classification of intrinsically disordered regions and proteins. Chem. Rev. 2014, 114, 6589–6631. 10.1021/cr400525m. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Banani S. F.; Lee H. O.; Hyman A. A.; Rosen M. K. Biomolecular condensates: organizers of cellular biochemistry. Nat. Rev. Mol. Cell Biol. 2017, 18, 285–298. 10.1038/nrm.2017.7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uversky V. N. What does it mean to be natively unfolded?. Eur. J. Biochem. 2002, 269, 2–12. 10.1046/j.0014-2956.2001.02649.x. [DOI] [PubMed] [Google Scholar]
- Crick S. L.; Jayaraman M.; Frieden C.; Wetzel R.; Pappu R. V. Fluorescence correlation spectroscopy shows that monomeric polyglutamine molecules form collapsed structures in aqueous solutions. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 16764–16769. 10.1073/pnas.0608175103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Müller-Späth S.; Soranno A.; Hirschfeld V.; Hofmann H.; Rüegger S.; Reymond L.; Nettels D.; Schuler B. Charge interactions can dominate the dimensions of intrinsically disordered proteins. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 14609–14614. 10.1073/pnas.1001743107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mao A. H.; Crick S. L.; Vitalis A.; Chicoine C. L.; Pappu R. V. Net charge per residue modulates conformational ensembles of intrinsically disordered proteins. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 8183–8188. 10.1073/pnas.0911107107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hofmann H.; Soranno A.; Borgia A.; Gast K.; Nettels D.; Schuler B. Polymer scaling laws of unfolded and intrinsically disordered proteins quantified with single-molecule spectroscopy. Proc. Natl. Acad. Sci. U.S.A. 2012, 109, 16155–16160. 10.1073/pnas.1207719109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Das R. K.; Pappu R. V. Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proc. Natl. Acad. Sci. U.S.A. 2013, 110, 13392. 10.1073/pnas.1304749110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sawle L.; Ghosh K. A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. J. Chem. Phys. 2015, 143, 085101 10.1063/1.4929391. [DOI] [PubMed] [Google Scholar]
- Fuertes G.; Banterle N.; Ruff K. M.; Chowdhury A.; Mercadante D.; Koehler C.; Kachala M.; Estrada Girona G.; Milles S.; Mishra A.; Onck P. R.; Grater F.; Esteban-Martin S.; Pappu R. V.; Svergun D. I.; Lemke E. A. Decoupling of size and shape fluctuations in heteropolymeric sequences reconciles discrepancies in SAXS vs. FRET measurements. Proc. Natl. Acad. Sci. U.S.A. 2017, 114, E6342–E6351. 10.1073/pnas.1704692114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Best R. B.; Zheng W.; Borgia A.; Buholzer K.; Borgia M. B.; Hofmann H.; Soranno A.; Nettels D.; Gast K.; Grishaev A.; Schuler B. Comment on ″Innovative scattering analysis shows that hydrophobic disordered proteins are expanded in water. Science 2018, 361 (6405), eaar7101 10.1126/science.aar7101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holehouse A. S.; Pappu R. V. Collapse Transitions of Proteins and the Interplay Among Backbone, Sidechain, and Solvent Interactions. Annu. Rev. Biophys. 2018, 47, 19–39. 10.1146/annurev-biophys-070317-032838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shea J. E.; Best R. B.; Mittal J. Physics-based computational and theoretical approaches to intrinsically disordered proteins. Curr. Opin. Struct. Biol. 2021, 67, 219–225. 10.1016/j.sbi.2020.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tesei G.; Trolle A. I.; Jonsson N.; Betz J.; Knudsen F. E.; Pesce F.; Johansson K. E.; Lindorff-Larsen K. Conformational ensembles of the human intrinsically disordered proteome. Nature 2024, 626, 897–904. 10.1038/s41586-023-07004-5. [DOI] [PubMed] [Google Scholar]
- O’Brien E. P.; Morrison G.; Brooks B. R.; Thirumalai D. How accurate are polymer models in the analysis of Förster resonance energy transfer experiments on proteins?. J. Chem. Phys. 2009, 130 (12), 124903 10.1063/1.3082151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schuler B.; Soranno A.; Hofmann H.; Nettels D. Single-Molecule FRET Spectroscopy and the Polymer Physics of Unfolded and Intrinsically Disordered Proteins. Annu. Rev. Biophys. 2016, 45, 207–231. 10.1146/annurev-biophys-062215-010915. [DOI] [PubMed] [Google Scholar]
- Mao A. H.; Lyle N.; Pappu R. V. Describing sequence-ensemble relationships for intrinsically disordered proteins. Biochem. J. 2013, 449, 307–318. 10.1042/BJ20121346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teufel D. P.; Johnson C. M.; Lum J. K.; Neuweiler H. Backbone-Driven Collapse in Unfolded Protein Chains. J. Mol. Biol. 2011, 409, 250–262. 10.1016/j.jmb.2011.03.066. [DOI] [PubMed] [Google Scholar]
- Borgia A.; Zheng W.; Buholzer K.; Borgia M. B.; Schuler A.; Hofmann H.; Soranno A.; Nettels D.; Gast K.; Grishaev A.; Best R. B.; Schuler B. Consistent View of Polypeptide Chain Expansion in Chemical Denaturants from Multiple Experimental Methods. J. Am. Chem. Soc. 2016, 138, 11714–11726. 10.1021/jacs.6b05917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubinstein M.; Colby R. H.. Polymer Physics; Oxford University Press: Oxford, New York, 2003; p xi, 440 p. [Google Scholar]
- Yu M.; Heidari M.; Mikhaleva S.; Tan P. S.; Mingu S.; Ruan H.; Reinkemeier C. D.; Obarska-Kosinska A.; Siggel M.; Beck M.; Hummer G.; Lemke E. A. Visualizing the disordered nuclear transport machinery in situ. Nature 2023, 617, 162–169. 10.1038/s41586-023-05990-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Riback J. A.; Bowman M. A.; Zmyslowski A. M.; Knoverek C. R.; Jumper J. M.; Hinshaw J. R.; Kaye E. B.; Freed K. F.; Clark P. L.; Sosnick T. R. Innovative scattering analysis shows that hydrophobic disordered proteins are expanded in water. Science 2017, 358, 238–241. 10.1126/science.aan5774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lotthammer J. M.; Ginell G. M.; Griffith D.; Emenecker R. J.; Holehouse A. S. Direct prediction of intrinsically disordered protein conformational properties from sequence. Nat. Methods 2024, 21, 465–476. 10.1038/s41592-023-02159-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng W.; Zerze G. H.; Borgia A.; Mittal J.; Schuler B.; Best R. B. Inferring properties of disordered chains from FRET transfer efficiencies. J. Chem. Phys. 2018, 148, 123329 10.1063/1.5006954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Valle F.; Favre M.; De Los Rios P.; Rosa A.; Dietler G. Scaling Exponents and Probability Distributions of DNA End-to-End Distance. Phys. Rev. Lett. 2005, 95, 158105 10.1103/PhysRevLett.95.158105. [DOI] [PubMed] [Google Scholar]
- Steinhauser M. O. A molecular dynamics study on universal properties of polymer chains in different solvent qualities. Part I. A review of linear chain properties. J. Chem. Phys. 2005, 122, 094901 10.1063/1.1846651. [DOI] [PubMed] [Google Scholar]
- Sanchez I. C. Phase-Transition Behavior of the Isolated Polymer-Chain. Macromolecules 1979, 12, 980–988. 10.1021/ma60071a040. [DOI] [Google Scholar]
- Baul U.; Chakraborty D.; Mugnai M. L.; Straub J. E.; Thirumalai D. Sequence Effects on Size, Shape, and Structural Heterogeneity in Intrinsically Disordered Proteins. J. Phys. Chem. B 2019, 123, 3462–3474. 10.1021/acs.jpcb.9b02575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marsh J. A.; Forman-Kay J. D. Sequence Determinants of Compaction in Intrinsically Disordered Proteins. Biophys. J. 2010, 98, 2383–2390. 10.1016/j.bpj.2010.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Das R. K.; Ruff K. M.; Pappu R. V. Relating sequence encoded information to form and function of intrinsically disordered proteins. Curr. Opin. Struct. Biol. 2015, 32, 102–112. 10.1016/j.sbi.2015.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin E. W.; Holehouse A. S.; Grace C. R.; Hughes A.; Pappu R. V.; Mittag T. Sequence Determinants of the Conformational Properties of an Intrinsically Disordered Protein Prior to and upon Multisite Phosphorylation. J. Am. Chem. Soc. 2016, 138, 15323–15335. 10.1021/jacs.6b10272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Best R. B. Emerging consensus on the collapse of unfolded and intrinsically disordered proteins in water. Curr. Opin. Struct. Biol. 2020, 60, 27–38. 10.1016/j.sbi.2019.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng W.; Dignon G.; Brown M.; Kim Y. C.; Mittal J. Hydropathy Patterning Complements Charge Patterning to Describe Conformational Preferences of Disordered Proteins. J. Phys. Chem. Lett. 2020, 11, 3408–3415. 10.1021/acs.jpclett.0c00288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vuzman D.; Levy Y. DNA search efficiency is modulated by charge composition and distribution in the intrinsically disordered tail. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 21004–21009. 10.1073/pnas.1011775107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borgia A.; Borgia M. B.; Bugge K.; Kissling V. M.; Heidarsson P. O.; Fernandes C. B.; Sottini A.; Soranno A.; Buholzer K. J.; Nettels D.; Kragelund B. B.; Best R. B.; Schuler B. Extreme disorder in an ultrahigh-affinity protein complex. Nature 2018, 555, 61–66. 10.1038/nature25762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dannenhoffer-Lafage T.; Best R. B. A Data-Driven Hydrophobicity Scale for Predicting Liquid-Liquid Phase Separation of Proteins. J. Phys. Chem. B 2021, 125, 4046–4056. 10.1021/acs.jpcb.0c11479. [DOI] [PubMed] [Google Scholar]
- Holmstrom E. D.; Holla A.; Zheng W.; Nettels D.; Best R. B.; Schuler B.. Accurate Transfer Efficiencies, Distance Distributions, and Ensembles of Unfolded and Intrinsically Disordered Proteins From Single-Molecule FRET. In Methods in Enzymology; Elsevier, 2018; Vol. 611, pp 287–325 10.1016/bs.mie.2018.09.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tesei G.; Schulze T. K.; Crehuet R.; Lindorff-Larsen K. Accurate model of liquid-liquid phase behavior of intrinsically disordered proteins from optimization of single-chain properties. Proc. Natl. Acad. Sci. U.S.A. 2021, 118 (44), e2111696118 10.1073/pnas.2111696118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vitalis A.; Pappu R. V. ABSINTH: a new continuum solvation model for simulations of polypeptides in aqueous solutions. J. Comput. Chem. 2009, 30, 673–699. 10.1002/jcc.21005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liao J. Y.; Yang B.; Zhang Y. C.; Wang X. J.; Ye Y.; Peng J. W.; Yang Z. Z.; He J. H.; Zhang Y.; Hu K.; Lin D. C.; Yin D. EuRBPDB: a comprehensive resource for annotation, functional and oncological investigation of eukaryotic RNA binding proteins (RBPs). Nucleic Acids Res. 2020, 48, D307–D313. 10.1093/nar/gkz823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Varadi M.; Zsolyomi F.; Guharoy M.; Tompa P. Functional Advantages of Conserved Intrinsic Disorder in RNA-Binding Proteins. PLoS One 2015, 10 (10), e0139731 10.1371/journal.pone.0139731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- González-Foutel N. S.; Glavina J.; Borcherds W. M.; Safranchik M.; Barrera-Vilarmau S.; Sagar A.; Estaña A.; Barozet A.; Garrone N. A.; Fernandez-Ballester G.; Blanes-Mira C.; Sánchez I. E.; de Prat-Gay G.; Cortés J.; Bernadó P.; Pappu R. V.; Holehouse A. S.; Daughdrill G. W.; Chemes L. B. Conformational buffering underlies functional selection in intrinsically disordered protein regions. Nat. Struct. Mol. Biol. 2022, 29, 781–790. 10.1038/s41594-022-00811-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2022, 51 (D1), D523–D531. 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Emenecker R. J.; Griffith D.; Holehouse A. S. Metapredict: a fast, accurate, and easy-to-use predictor of consensus disorder and structure. Biophys. J. 2021, 120, 4312–4319. 10.1016/j.bpj.2021.08.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uversky V. N.; Gillespie J. R.; Fink A. L. Why are ″natively unfolded″ proteins unstructured under physiologic conditions?. Proteins 2000, 41, 415–427. . [DOI] [PubMed] [Google Scholar]
- Hillger F.; Nettels D.; Dorsch S.; Schuler B. Detection and analysis of protein aggregation with confocal single molecule fluorescence spectroscopy. J. Fluoresc. 2007, 17, 759–765. 10.1007/s10895-007-0187-z. [DOI] [PubMed] [Google Scholar]
- Kudryavtsev V.; Sikor M.; Kalinin S.; Mokranjac D.; Seidel C. A. M.; Lamb D. C. Combining MFD and PIE for Accurate Single-Pair Forster Resonance Energy Transfer Measurements. ChemPhysChem 2012, 13, 1060–1078. 10.1002/cphc.201100822. [DOI] [PubMed] [Google Scholar]
- Klose D.; Holla A.; Gmeiner C.; Nettels D.; Ritsch I.; Bross N.; Yulikov M.; Allain F. H.; Schuler B.; Jeschke G. Resolving distance variations by single-molecule FRET and EPR spectroscopy using rotamer libraries. Biophys. J. 2021, 120, 4842–4858. 10.1016/j.bpj.2021.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kyte J.; Doolittle R. F. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 1982, 157, 105–132. 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
- Ruff K. M.Predicting Conformational Properties of Intrinsically Disordered Proteins from Sequence. In Intrinsically Disordered Proteins, Methods in Molecular Biology; Springer US, 2020; Vol. 2141, pp 347–389 10.1007/978-1-0716-0524-0_18. [DOI] [PubMed] [Google Scholar]
- Hellenkamp B.; Schmid S.; Doroshenko O.; Opanasyuk O.; Kuhnemuth R.; Rezaei Adariani S.; Ambrose B.; Aznauryan M.; Barth A.; Birkedal V.; Bowen M. E.; Chen H.; Cordes T.; Eilert T.; Fijen C.; Gebhardt C.; Gotz M.; Gouridis G.; Gratton E.; Ha T.; Hao P.; Hanke C. A.; Hartmann A.; Hendrix J.; Hildebrandt L. L.; Hirschfeld V.; Hohlbein J.; Hua B.; Hubner C. G.; Kallis E.; Kapanidis A. N.; Kim J. Y.; Krainer G.; Lamb D. C.; Lee N. K.; Lemke E. A.; Levesque B.; Levitus M.; McCann J. J.; Naredi-Rainer N.; Nettels D.; Ngo T.; Qiu R.; Robb N. C.; Rocker C.; Sanabria H.; Schlierf M.; Schroder T.; Schuler B.; Seidel H.; Streit L.; Thurn J.; Tinnefeld P.; Tyagi S.; Vandenberk N.; Vera A. M.; Weninger K. R.; Wunsch B.; Yanez-Orozco I. S.; Michaelis J.; Seidel C. A. M.; Craggs T. D.; Hugel T. Precision and accuracy of single-molecule FRET measurements-a multi-laboratory benchmark study. Nat. Methods 2018, 15, 669–676. 10.1038/s41592-018-0085-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin L.-K. A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics 1989, 45, 255–268. 10.2307/2532051. [DOI] [PubMed] [Google Scholar]
- Holehouse A. S.; Garai K.; Lyle N.; Vitalis A.; Pappu R. V. Quantitative Assessments of the Distinct Contributions of Polypeptide Backbone Amides versus Side Chain Groups to Chain Expansion via Chemical Denaturation. J. Am. Chem. Soc. 2015, 137, 2984–2995. 10.1021/ja512062h. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mukhopadhyay S.; Krishnan R.; Lemke E. A.; Lindquist S.; Deniz A. A. A natively unfolded yeast prion monomer adopts an ensemble of collapsed and rapidly fluctuating structures. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 2649–2654. 10.1073/pnas.0611503104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tran H. T.; Mao A.; Pappu R. V. Role of Backbone–Solvent Interactions in Determining Conformational Equilibria of Intrinsically Disordered Proteins. J. Am. Chem. Soc. 2008, 130, 7380–7392. 10.1021/ja710446s. [DOI] [PubMed] [Google Scholar]
- Walters R. H.; Murphy R. M. Examining Polyglutamine Peptide Length: A Connection between Collapsed Conformations and Increased Aggregation. J. Mol. Biol. 2009, 393, 978–992. 10.1016/j.jmb.2009.08.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karandur D.; Wong K.-Y.; Pettitt B. M. Solubility and Aggregation of Gly5 in Water. J. Phys. Chem. B 2014, 118, 9565–9572. 10.1021/jp503358n. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karandur D.; Harris R. C.; Pettitt B. M. Protein collapse driven against solvation free energy without H-bonds. Protein Sci. 2016, 25, 103–110. 10.1002/pro.2749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choi J.-M.; Pappu R. V. Improvements to the ABSINTH Force Field for Proteins Based on Experimentally Derived Amino Acid Specific Backbone Conformational Statistics. J. Chem. Theory Comput. 2019, 15, 1367–1382. 10.1021/acs.jctc.8b00573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Humphrey W.; Dalke A.; Schulten K. VMD: Visual molecular dynamics. J. Mol. Graphics 1996, 14, 33–38. 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
- Dignon G. L.; Zheng W.; Kim Y. C.; Best R. B.; Mittal J. Sequence determinants of protein phase behavior from a coarse-grained model. PLoS Comput. Biol. 2018, 14, e1005941 10.1371/journal.pcbi.1005941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Norgaard A. B.; Ferkinghoff-Borg J.; Lindorff-Larsen K. Experimental Parameterization of an Energy Function for the Simulation of Unfolded Proteins. Biophys. J. 2008, 94, 182–192. 10.1529/biophysj.107.108241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang L. P.; Chen J.; Van Voorhis T. Systematic Parametrization of Polarizable Force Fields from Quantum Chemistry Data. J. Chem. Theory Comput. 2013, 9, 452–460. 10.1021/ct300826t. [DOI] [PubMed] [Google Scholar]
- Muschielok A.; Andrecka J.; Jawhari A.; Brückner F.; Cramer P.; Michaelis J. A nano-positioning system for macromolecular structural analysis. Nat. Methods 2008, 5, 965–971. 10.1038/nmeth.1259. [DOI] [PubMed] [Google Scholar]
- Sindbert S.; Kalinin S.; Nguyen H.; Kienzler A.; Clima L.; Bannwarth W.; Appel B.; Muller S.; Seidel C. A. Accurate distance determination of nucleic acids via Forster resonance energy transfer: implications of dye linker length and rigidity. J. Am. Chem. Soc. 2011, 133, 2463–2480. 10.1021/ja105725e. [DOI] [PubMed] [Google Scholar]
- Klose D.; Klare J. P.; Grohmann D.; Kay C. W. M.; Werner F.; Steinhoff H. J. Simulation vs. Reality: A Comparison of In Silico Distance Predictions with DEER and FRET Measurements. PLoS One 2012, 7 (6), e39492 10.1371/journal.pone.0039492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Warner J. B.; Ruff K. M.; Tang P. S.; Lemke E. A.; Pappu R. V.; Lashuel H. A. Monomeric Huntingtin Exon 1 Has Similar Overall Structural Features for Wild-Type and Pathological Polyglutamine Lengths. J. Am. Chem. Soc. 2017, 139, 14456–14469. 10.1021/jacs.7b06659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grotz K. K.; Nueesch M. F.; Holmstrom E. D.; Heinz M.; Stelzl L. S.; Schuler B.; Hummer G. Dispersion Correction Alleviates Dye Stacking of Single-Stranded DNA and RNA in Simulations of Single-Molecule Fluorescence Experiments. J. Phys. Chem. B 2018, 122, 11626–11639. 10.1021/acs.jpcb.8b07537. [DOI] [PubMed] [Google Scholar]
- Montepietra D.; Tesei G.; Martins J. M.; Kunze M. B. A.; Best R. B.; Lindorff-Larsen K. FRETpredict: a Python package for FRET efficiency predictions using rotamer libraries. Commun. Biol. 2024, 7, 298 10.1038/s42003-024-05910-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tesei G.; Lindorff-Larsen K. Improved predictions of phase behaviour of intrinsically disordered proteins by tuning the interaction range. Open Res. Eur. 2022, 2, 94. 10.12688/openreseurope.14967.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pak C. W.; Kosno M.; Holehouse A. S.; Padrick S. B.; Mittal A.; Ali R.; Yunus A. A.; Liu D. R.; Pappu R. V.; Rosen M. K. Sequence Determinants of Intracellular Phase Separation by Complex Coacervation of a Disordered Protein. Mol. Cell 2016, 63, 72–85. 10.1016/j.molcel.2016.05.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vernon R. M.; Chong P. A.; Tsang B.; Kim T. H.; Bah A.; Farber P.; Lin H.; Forman-Kay J. D. Pi-Pi contacts are an overlooked protein feature relevant to phase separation. eLife 2018, 7, e31486 10.7554/eLife.31486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin E. W.; Holehouse A. S.; Peran I.; Farag M.; Incicco J. J.; Bremer A.; Grace C. R.; Soranno A.; Pappu R. V.; Mittag T. Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science 2020, 367, 694–699. 10.1126/science.aaw8653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bremer A.; Farag M.; Borcherds W. M.; Peran I.; Martin E. W.; Pappu R. V.; Mittag T. Deciphering how naturally occurring sequence features impact the phase behaviours of disordered prion-like domains. Nat. Chem. 2022, 14, 196–207. 10.1038/s41557-021-00840-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Riback J. A.; Bowman M. A.; Zmyslowski A. M.; Plaxco K. W.; Clark P. L.; Sosnick T. R. Commonly used FRET fluorophores promote collapse of an otherwise disordered protein. Proc. Natl. Acad. Sci. U.S.A. 2019, 116, 8889–8894. 10.1073/pnas.1813038116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Radhakrishnan A.; Vitalis A.; Mao A. H.; Steffen A. T.; Pappu R. V. Improved Atomistic Monte Carlo Simulations Demonstrate That Poly-l-Proline Adopts Heterogeneous Ensembles of Conformations of Semi-Rigid Segments Interrupted by Kinks. J. Phys. Chem. B 2012, 116, 6862–6871. 10.1021/jp212637r. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haran G. How, when and why proteins collapse: the relation to folding. Curr. Opin. Struct. Biol. 2012, 22, 14–20. 10.1016/j.sbi.2011.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vancraenenbroeck R.; Harel Y. S.; Zheng W.; Hofmann H. Polymer effects modulate binding affinities in disordered proteins. Proc. Natl. Acad. Sci. U.S.A. 2019, 116, 19506–19512. 10.1073/pnas.1904997116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sørensen C. S.; Kjaergaard M. Effective concentrations enforced by intrinsically disordered linkers are governed by polymer physics. Proc. Natl. Acad. Sci. U.S.A. 2019, 116, 23124–23131. 10.1073/pnas.1904813116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Masliah G.; Barraud P.; Allain F. H. T. RNA recognition by double-stranded RNA binding domains: a matter of shape and sequence. Cell. Mol. Life Sci. 2013, 70, 1875–1895. 10.1007/s00018-012-1119-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maris C.; Dominguez C.; Allain F. H. T. The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression. FEBS J. 2005, 272, 2118–2131. 10.1111/j.1742-4658.2005.04653.x. [DOI] [PubMed] [Google Scholar]
- Haenni D.; Zosel F.; Reymond L.; Nettels D.; Schuler B. Intramolecular distances and dynamics from the combined photon statistics of single-molecule FRET and photoinduced electron transfer. J. Phys. Chem. B 2013, 117, 13015–13028. 10.1021/jp402352s. [DOI] [PubMed] [Google Scholar]
- Holehouse A. S.; Das R. K.; Ahad J. N.; Richardson M. O.; Pappu R. V. CIDER: Resources to Analyze Sequence-Ensemble Relationships of Intrinsically Disordered Proteins. Biophys. J. 2017, 112, 16–21. 10.1016/j.bpj.2016.11.3200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bao W. J.; Gao Y. G.; Chang Y. G.; Zhang T. Y.; Lin X. J.; Yan X. Z.; Hu H. Y. Highly efficient expression and purification system of small-size protein domains in Escherichia coli for biochemical characterization. Protein Expression Purif. 2006, 47, 599–606. 10.1016/j.pep.2005.11.021. [DOI] [PubMed] [Google Scholar]
- Kuipers B. J. H.; Gruppen H. Prediction of molar extinction coefficients of proteins and peptides using UV absorption of the constituent amino acids at 214 nm to enable quantitative reverse phase high-performance liquid chromatography-mass spectrometry analysis. J. Agric. Food Chem. 2007, 55, 5445–5451. 10.1021/jf070337l. [DOI] [PubMed] [Google Scholar]
- Müller B. K.; Zaychikov E.; Bräuchle C.; Lamb D. C. Pulsed interleaved excitation. Biophys. J. 2005, 89, 3508–3522. 10.1529/biophysj.105.064766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eggeling C.; Berger S.; Brand L.; Fries J. R.; Schaffer J.; Volkmer A.; Seidel C. A. M. Data registration and selective single-molecule analysis using multi-parameter fluorescence detection. J. Biotechnol. 2001, 86, 163–180. 10.1016/S0168-1656(00)00412-0. [DOI] [PubMed] [Google Scholar]
- Nir E.; Michalet X.; Hamadani K. M.; Laurence T. A.; Neuhauser D.; Kovchegov Y.; Weiss S. Shot-Noise Limited Single-Molecule FRET Histograms: Comparison between Theory and Experiments. J. Phys. Chem. B 2006, 110, 22103–22124. 10.1021/jp063483n. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schuler B.Application of Single Molecule Förster Resonance Energy Transfer to Protein Folding. In Protein Folding Protocols, Methods in Molecular Biology; Humana Press, 2006; Vol. 350, pp 115–138 10.1385/1-59745-189-4:115. [DOI] [PubMed] [Google Scholar]
- Lee N. K.; Kapanidis A. N.; Wang Y.; Michalet X.; Mukhopadhyay J.; Ebright R. H.; Weiss S. Accurate FRET Measurements within Single Diffusing Biomolecules Using Alternating-Laser Excitation. Biophys. J. 2005, 88, 2939–2953. 10.1529/biophysj.104.054114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delaglio F.; Grzesiek S.; Vuister G. W.; Zhu G.; Pfeifer J.; Bax A. NMRPipe: A multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 1995, 6, 277–293. 10.1007/BF00197809. [DOI] [PubMed] [Google Scholar]
- Lee W.; Tonelli M.; Markley J. L. NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinformatics 2015, 31, 1325–1327. 10.1093/bioinformatics/btu830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fischetti R.; Stepanov S.; Rosenbaum G.; Barrea R.; Black E.; Gore D.; Heurich R.; Kondrashkina E.; Kropf A. J.; Wang S.; Zhang K.; Irving T. C.; Bunker G. B. The BioCAT undulator beamline 18ID: a facility for biological non-crystalline diffraction and X-ray absorption spectroscopy at the Advanced Photon Source. J. Synchrotron Radiat. 2004, 11, 399–405. 10.1107/S0909049504016760. [DOI] [PubMed] [Google Scholar]
- Martin E. W.; Hopkins J. B.; Mittag T.. Chapter Seven - Small-angle X-ray scattering experiments of monodisperse intrinsically disordered protein samples close to the solubility limit. In Methods in Enzymology; Keating C. D., Ed.; Academic Press, 2021; Vol. 646, pp 185–222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hopkins J. B.; Gillilan R. E.; Skou S. BioXTAS RAW: improvements to a free open-source program for small-angle X-ray scattering data reduction and analysis. J. Appl. Crystallogr. 2017, 50, 1545–1553. 10.1107/S1600576717011438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin E. W.; Harmon T. S.; Hopkins J. B.; Chakravarthy S.; Incicco J. J.; Schuck P.; Soranno A.; Mittag T. A multi-step nucleation process determines the kinetics of prion-like domain phase separation. Nat. Commun. 2021, 12, 4513 10.1038/s41467-021-24727-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mao A. H.; Pappu R. V. Crystal lattice properties fully determine short-range interaction parameters for alkali and halide ions. J. Chem. Phys. 2012, 137, 064104 10.1063/1.4742068. [DOI] [PubMed] [Google Scholar]
- Leung H. T. A.; Bignucolo O.; Aregger R.; Dames S. A.; Mazur A.; Berneche S.; Grzesiek S. A Rigorous and Efficient Method To Reweight Very Large Conformational Ensembles Using Average Experimental Data and To Determine Their Relative Information Content. J. Chem. Theory Comput. 2016, 12, 383–394. 10.1021/acs.jctc.5b00759. [DOI] [PubMed] [Google Scholar]
- Lalmansingh J. M.; Keeley A. T.; Ruff K. M.; Pappu R. V.; Holehouse A. S. SOURSOP: A Python Package for the Analysis of Simulations of Intrinsically Disordered Proteins. J. Chem. Theory Comput. 2023, 19, 5609–5620. 10.1021/acs.jctc.3c00190. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.