Significance
Most computationally designed proteins fail to fold into their designed structures. This low success rate is a major obstacle to expanding the applications of protein design. In previous work, we discovered a small protein fold that was paradoxically challenging to design (only a 2% success rate) even though the fold itself is very simple. Here, we used a recently developed high-throughput approach to comprehensively examine the design rules for this simple fold. By designing over 10,000 proteins and experimentally measuring their folding stability, we discovered the key biophysical properties that determine the stability of these designs. Our results illustrate general lessons for protein design and also demonstrate how high-throughput stability studies can quantify the importance of different biophysical forces.
Keywords: protein design, protein folding, protein engineering
Abstract
Designing entirely new protein structures remains challenging because we do not fully understand the biophysical determinants of folding stability. Yet, some protein folds are easier to design than others. Previous work identified the 43-residue ɑββɑ fold as especially challenging: The best designs had only a 2% success rate, compared to 39 to 87% success for other simple folds [G. J. Rocklin et al., Science 357, 168–175 (2017)]. This suggested the ɑββɑ fold would be a useful model system for gaining a deeper understanding of folding stability determinants and for testing new protein design methods. Here, we designed over 10,000 new ɑββɑ proteins and found over 3,000 of them to fold into stable structures using a high-throughput protease-based assay. NMR, hydrogen-deuterium exchange, circular dichroism, deep mutational scanning, and scrambled sequence control experiments indicated that our stable designs fold into their designed ɑββɑ structures with exceptional stability for their small size. Our large dataset enabled us to quantify the influence of universal stability determinants including nonpolar burial, helix capping, and buried unsatisfied polar atoms, as well as stability determinants unique to the ɑββɑ topology. Our work demonstrates how large-scale design and test cycles can solve challenging design problems while illuminating the biophysical determinants of folding.
Improving our understanding of the determinants of protein stability (1–3) would accelerate biological, biomedical, and biotechnology research. In particular, computational models of protein stability are commonly used for a range of applications, including protein design (4–6), stabilizing naturally occurring proteins (7, 8), and predicting the effects of point mutants (9–11). However, all of these models have important limitations. For example, most computationally designed proteins made by experts fail to fold and function (12–14). Nonexperts avoid computational design techniques because they are not reliable. These challenges stem from our incomplete understanding of the biophysical determinants of folding stability and from the difficulty of encoding these determinants into computational models for practical applications.
Recently, we introduced a high-throughput approach to study protein folding stability that is particularly helpful for improving computational modeling and design. In our approach, we designed thousands of de novo proteins and measured their folding stabilities using a yeast display-based proteolysis assay coupled to next-generation sequencing (15). Several new studies have applied our methodology (16–19) as it has several advantages. First, measuring folding stability for thousands of proteins makes it possible to statistically quantify biophysical features that contribute to stability. Second, examining diverse sequences makes it easier to derive principles that are not specific to a particular protein context. Finally, assaying computationally designed proteins focuses the experimentation on the regions of sequence and structural space that are predicted to be low-energy according to a particular computational model, which is especially useful for improving that model.
We previously used this approach to increase the success rate (i.e., fraction of designs that form stable, folded structures) of de novo miniprotein designs from 6 to 47% (15). Three different protein topologies could be designed very robustly (39 to 87% success), but a fourth topology (ɑββɑ, 43 residues) proved very challenging. Only 2% of ɑββɑ designs folded into stable structures despite the simplicity of the structure and four repeated efforts to improve the design procedure (Fig. 1A). This suggested that our design procedure and stability model were missing something fundamental about the ɑββɑ topology, and that this particular fold could be a useful model system for building a deeper understanding of folding stability. Here, we investigated this by asking two main questions. First, how can we improve our design procedure to obtain a large number of stable ɑββɑ proteins for further analysis? Notably, there are no naturally occurring examples of the 43-residue ɑββɑ fold for us to learn from, although this architecture is similar to the unusual 55-residue ɑββɑ fold of the gpW protein from bacteriophage lambda (20). Second, how do the biophysical and topological features of different ɑββɑ designs combine to determine each protein’s folding stability? We investigated these questions by designing and experimentally testing over 10,000 new ɑββɑ miniproteins using our high-throughput approach. We also examined whether the structure prediction model AlphaFold 2 (21) could be applied to differentiate stable and unstable designs.
Fig. 1.
Design strategy for generating and testing αββα miniproteins. (A) Previously, we performed four iterative design–test–analysis cycles to generate stable αββα miniprotein designs but only achieved a 2% success rate (15). (B) Here, we designed thousands of new αββα miniproteins using Rosetta (6,000 designs in Round 5 and 5,307 designs in Round 6) and experimentally tested them for their folding stability using a combined yeast display and protease sensitivity assay. (C) We then performed computational analysis to identify and understand the relative importance of key stability determinants (e.g., hydrophobic contacts, helix capping, loop patterning, local sequence-structure agreement, and net charge).
Results
Designing αββα Miniproteins Using a Restricted Design Strategy.
We first computationally designed thousands of new αββα miniproteins (“Round 5”) based on lessons learned from our previous four rounds of design (15). All designs were based on a single protein architecture (22) that previously led to the greatest number of stable designs (SI Appendix, Fig. S1A). This architecture restricted our new αββα miniproteins to 14-residue α-helices, 3-residue β-strands, and a specific loop structure (Fig. 1B). In addition, we ensured our designs met strict criteria for buried nonpolar surface area, Rosetta energy, and predicted secondary structure (SI Appendix, Fig. S1B). Finally, we required the middle loop to have a hydrophobic residue, required solvent-facing residues on the β-strands to be polar or charged, set a minimum threshold for the total number of hydrophobic residues, and eliminated Gly, Thr, and Val in helices (SI Appendix, Fig. S1 C and D) (Materials and Methods). We hypothesized these restrictions would increase the success of our new designs because these constraints would enforce the overall αββα topology and build a larger hydrophobic core. However, this would reduce the potential sequence and structural diversity.
Based on this “restricted” design strategy, we generated 28,000 αββα miniproteins using an improved version of the Rosetta score function. This score function was previously parameterized to correlate with our earlier high-throughput data on miniprotein folding stability (23). In addition, we used an improved sequence sampling procedure that minimizes overcompaction and produces more native-like protein cores containing bulky residues (24). Our final set of 6,000 αββα designs were chosen by ranking the predicted stabilities of all 28,000 αββα designs using a linear regression model trained on previous large-scale αββα stability data (15). Because this regression model included a low number of stable designs (60/2,830), we used this model for the practical task of selecting designs, but we did not expect reliable performance. After we ranked our designs, we eliminated designs that were more than 31/43 residues identical to a higher-ranking design. Within our final set of 6,000 designs, the median backbone root-mean-square deviation (RMSD) between any two designs was 2 Å (SI Appendix, Fig. S2) and the median sequence identity was 35% (SI Appendix, Fig. S3). Each design based on this restricted strategy is named HEEH_TK_rd5_####, where HEEH indicates the pattern of α-helices (H) and β-strands (E), TK indicates the designer (author TEK), rd5 indicates these designs follow our four previous efforts (15), and #### is the design number.
Biophysical Characterization of αββα Miniproteins Using a Restricted Design Strategy.
We measured the folding stabilities of our newly designed αββα miniproteins using the high-throughput protease sensitivity assay introduced previously (Fig. 1B) (15). Briefly, all sequences were synthesized as DNA oligonucleotides in a pooled library. We then used Saccharomyces cerevisiae to express and display our sequences on their cell surface, along with a C-terminal myc tag. Next, we subjected the yeast cells to varying concentrations of trypsin and chymotrypsin (tested separately) (SI Appendix, Fig. S5 A and B) and fluorescently labeled the cells displaying protease-resistant sequences. Finally, we sorted the fluorescently labeled cells by flow cytometry and identified the protease-resistant sequences by deep sequencing (Fig. 1B). Out of the 6,000 designs, only 5,662 designs had sufficient sequencing counts to precisely determine their protease sensitivity, and we used this set of 5,662 designs for our analysis. As previously, we assigned each design a “stability score,” defined as the difference between that sequence’s observed protease sensitivity and the predicted sensitivity of that sequence in its unfolded state. Each one-unit increase in stability score indicates a 10-fold higher amount of protease required to cleave that sequence under assay conditions, compared with the predicted protease concentration required to cleave that sequence in its unfolded state (15). To conservatively identify stable designs, each design’s overall stability score is the minimum of the stability scores observed separately with trypsin and chymotrypsin. We previously observed that sequences of scrambled amino acids (not designed sequences) rarely have stability scores above 1, and so we classify designs as stable when their stability score exceeds 1.
Our set of 5,662 designs had an average stability score of 0.81, and we classified 38% of these designs as stable (stability score >1; Fig. 2A). The stable set had a median pairwise sequence identity of 37% (SI Appendix, Fig. S3). This greatly exceeded our previous success rate of 2% (Fig. 1A) (15). We also included control sequences in our library whose residue compositions matched our αββα designs, but with the ordering of the residues scrambled in a specific manner: Polar residues remained polar, nonpolar residues remained nonpolar, and proline and glycine residues remained in their identical positions. In contrast to our designs, almost all scrambled sequences had stability scores <1 with an average stability score of −0.86 (Fig. 2A). This suggests that the protease resistance observed for a subset of designs can be attributed to the folding stability of their designed structures, rather than generic properties of their sequences such as residue composition or patterning. In addition, stability scores measured using trypsin and chymotrypsin were correlated with each other despite the differing specificities of the proteases (SI Appendix, Fig. S5 A and B). This further indicates that our measured stability scores reflect folding stability rather than protease-specific factors.
Fig. 2.
Experimental testing and analysis of αββα stability determinants from a restricted design strategy. (A) The stability score distributions of designed αββα miniproteins (blue), scrambled sequences (gray), and previously published αββα miniproteins (red) (15); the vertical line at stability score = 1 denotes the threshold above which we consider a design to be stable. (B–G) The relation between Individual protein features and stability score. For Rosetta energy, lower values indicate favorable energies, and for local sequence-structure propensity higher values indicate favorable propensity. Black lines show moving averages; red lines show fits to quadratic (F) and linear (G) models. (H) A 10-feature linear regression model was built using normalized data, and the experimental stability scores are compared to the model’s predicted stability scores. (I) The magnitudes of the coefficients from the model based on their importance in the dataset (Left) and their biophysical strength (Right). Error bars indicate 95% CIs from bootstrapping.
We next sought to verify that stable αββα miniproteins folded as designed using several orthogonal approaches. First, we selected six stable αββα designs with varying hydrophobicity values (25) and individually purified them from Escherichia coli (Fig. 3A and SI Appendix, Table S1) for circular dichroism (CD) and thermal denaturation. Protein purification by size-exclusion chromatography revealed that three of the six miniproteins (HEEH_TK_rd5_0420, HEEH_TK_rd5_0614, and HEEH_TK_rd5_0958) predominantly eluted at the expected molecular weight of a monomer, whereas the other three showed both monomeric and dimeric peaks (Fig. 3B).
Fig. 3.
Biophysical characterization of αββα miniproteins made using a restricted design strategy. (A) The stability scores of all αββα miniproteins made using a restricted design strategy are plotted by their hydrophobicity values (27). We selected six miniproteins (red dots) with varying hydrophobicity and (B) purified them by size-exclusion chromatography; vertical lines indicate expected dimeric and monomeric forms of the miniprotein based on a calibration curve (Materials and Methods). (C) Far-ultraviolet CD spectra are shown at 25 °C (black), 95 °C (red), and 25 °C after melting (blue). (D) Thermal denaturation was measured at 220 nm at every 1 °C from 25 °C to 95 °C. (E) Design models highlight positions that are most tolerant (teal) or least tolerant (yellow) to mutations. Key residues that stabilize the protein are shown in stick representation. Each miniprotein’s color scale is different to highlight the relative stabilizing or destabilizing effects within each protein; see SI Appendix, Fig. S5 for complete data. (F) Comparison of HEEH_TK_rd5_0958 Rosetta design model, NMR ensemble, and AlphaFold 2–predicted structures; overlay of the Rosetta design model (gray) and NMR ensemble (rainbow). (G) Opening energies determined by hydrogen–deuterium exchange for HEEH_TK_rd5_0958. Observed measurements are colored red–yellow on a cartoon model and plotted in blue. For residues that exchanged too quickly to measure, the upper limit of ΔGopen is plotted in red. (H) Comparison of HEEH_TK_rd5_0341 Rosetta design model, NMR ensemble, and AlphaFold 2–predicted structures; overlay of the Rosetta design model (gray) and NMR ensemble (rainbow). (I) NMR dimer structure shown in two different perspectives. (J) Opening energies determined by hydrogen–deuterium exchange for HEEH_TK_rd5_0341. Observed measurements are colored red-yellow on a cartoon model and plotted in blue. For residues that exchanged too quickly to measure, the upper limit of ΔGopen is plotted in red.
CD spectra exhibited helical secondary structure and reversible folding after heating to 95 °C (Fig. 3C), but the initial 25 °C and cooled 25 °C return measurements for HEEH_TK_rd5_0341 and HEEH_TK_rd5_3711 were not superimposable, possibly due to aggregation that altered the signal intensity. None of the designs showed a clear melting transition, although designs HEEH_TK_rd5_0958 and HEEH_TK_rd5_3711 lost much of their helical character at 95 °C. In contrast, design HEEH_TK_rd5_0420 was minimally perturbed during melting (Fig. 3D), indicating extreme thermostability.
Next, to spot-check the accuracy of our designed structures, we solved the structures of HEEH_TK_rd5_0958 and HEEH_TK_rd5_0341 by NMR. For HEEH_TK_rd5_0958, the average backbone RMSD of the design model compared to all 20 structures in the NMR ensemble was 1.26 Å (Fig. 3F). For HEEH_TK_rd5_0341, both monomers in the dimeric structure were also very close to the designed monomeric model: The average backbone RMSD of the design model compared to all 40 structures in the NMR ensemble was 1.65 Å (Fig. 3H). The structure was symmetrical so only one heteronuclear single quantum coherence peak was visible for each residue, although 15N NMR relaxation measurements were consistent with the dimeric state. The two monomers come together near the β-hairpin and designed N and C termini, burying hydrophobic residues in that region (Fig. 3I).
To analyze the structural differences between the design models and NMR structures, we quantified the number of contacts a residue in the design model gained or lost in the NMR model (SI Appendix, Fig. S6). Most of the residues gained or lost zero or one contact, indicating the close structural similarity between the design model and NMR ensemble. For HEEH_TK_rd5_0341, the protein was designed to form a monomer. So, residues at the dimeric interface (the N and C termini and the hairpin turn) all gained new contacts, changing the environments of these residues (SI Appendix, Fig. S6A). However, as shown by the overall RMSD, these changes did not affect the overall structure of each monomeric subunit.
We also examined the local stability of designs HEEH_TK_rd5_0958 and HEEH_TK_rd5_0341 by hydrogen deuterium exchange (HDX) NMR. The HDX opening free energies revealed differences in local stability in different regions of the topology (Fig. 3 G and J). The most stable secondary structure was Helix 2 for both miniproteins, with opening energies around 4 kcal/mol at 15 °C (compared to ∼2 to 3 kcal/mol in Helix 1). The central β-hairpin was the least stable structure in HEEH_TK_rd5_0341 (Fig. 3J). Four residues in this hairpin (I21, G23, I24, and V26) form intramolecular hydrogen bonds that should protect those amides from exchange (SI Appendix, Fig. S7A) but three of these residues exchanged too quickly in HEEH_TK_rd5_0341 to be measured by NMR. In contrast, three of the four hairpin residues that form intramolecular hydrogen bonds in HEEH_TK_rd5_0958 had measurable protection from exchange (Fig. 3G and SI Appendix, Fig. S7B) and were similarly stable to Helix 1. Overall, the hierarchy of stabilities between Helix 2, Helix 1, and the central β-hairpin suggests the folding energy landscape is not fully cooperative.
The highest opening energy in the monomeric HEEH_ TK_rd5_0958 was 4.5 kcal/mol, observed at I35 (Fig. 3 G and J). This highest opening energy typically indicates the global stability of the protein (26), making HEEH_TK_rd5_0958 almost 2 kcal/mol more stable than the previous highest stability observed for a designed αββα structure (15). However, this higher stability was observed at a lower temperature (15 °C instead of 25 °C in ref. 15) and in the presence of D2O, which typically stabilizes proteins.
Stability Determinants of αββα Designs from a Restricted Design Strategy.
We next investigated which design features correlated with folding stability. To this end, we computed over a thousand structural and sequence-based metrics for each design and analyzed whether particular metrics correlated with stability. Several of the strongest individual correlations are shown in Fig. 2. Designs were generally more stable if their Rosetta energy scores were lower (Fig. 2B) and had more hydrophobic residues and hydrophobic side-chain contacts (Fig. 2 C and D). Hydrophobic residue count correlated more strongly with stability than Rosetta energy. Stability also increased if a design’s sequence was highly compatible with its local backbone structure (Materials and Methods and Fig. 2E). Finally, increased net charge destabilized our designs, although the optimal net charge was slightly negative (Fig. 2 F and G). This stability change was approximately linear with the square of the net charge, as expected (27).
We also explored whether specific residues could individually have large influences on the stabilities of the designs. Because all designs are based on an identical architecture, each position in the sequence shares an identical structural role in all designs. Using the binomial test, we identified positions where specific amino acid identities had large and significant changes on the success rates of the designs (SI Appendix, Fig. S8). Two positions near the N and C termini stood out as particularly important. Positions 2 and 39 are near the tips of each helix and contact each other in space (SI Appendix, Fig. S8D). Across the design set, leucine residues at these positions increased the success rate of the designs by 25 to 39%, whereas other residues such as glutamate and tryptophan decreased the success rate by similar amounts. These differences in success rates were highly significant (adjusted P value <10−18) (SI Appendix, Fig. S8C). The importance of these residues suggests that termini of the helices play an especially important role in the overall stability of designed αββα miniproteins.
To further examine individual residue contributions to stability, we performed deep mutational scanning analyses (SI Appendix, Fig. S9) on the six αββα designs whose structures we verified by CD (Fig. 3C). Using our protease sensitivity assay (SI Appendix, Fig. S5 C and D), we measured the folding stability changes for all single mutants of each design (SI Appendix, Fig. S9). Four of the six mutational scans showed many destabilizing mutations from replacing nonpolar residues in both the helices and the strands. A fifth design (HEEH_TK_rd5_0420) showed a similar pattern, but the helical residues seemed less sensitive to mutations than the strands. The high stability score of HEEH_TK_rd5_0420 (at the peak of our assay’s dynamic range) may have limited us from resolving the stability effects of other mutations in the helices (SI Appendix, Fig. S9C). The sixth design (HEEH_TK_rd5_0018) showed many destabilizing substitutions at nonpolar sites in the helices, but only a small number could be observed in the β-hairpin, suggesting the hairpin may be less structured in this design (SI Appendix, Fig. S9A). Overall, the positions that were most sensitive to mutations (change in stability <1) were found in the buried hydrophobic core (SI Appendix, Figs. S10 and S11), and in particular large hydrophobic residues (SI Appendix, Fig. S11). In contrast, hydrophobic residues, as well as polar and charged residues, that were more solvent-exposed in the design models were less sensitive to mutation (SI Appendix, Fig. S11). The specific sequence–stability relationships shown in the mutational scanning data suggest that the designs fold into specific structures. Furthermore, the consistency between a nonpolar residue’s burial in the designed models and its sensitivity to mutation (SI Appendix, Fig. S11) provides support that the stable designs fold into their designed structures.
Charged and polar residues also contributed to folding stability, although they were less important than buried hydrophobic residues. The top three polar positions that were most sensitive to mutation (average change in stability <−0.5) were positions 15 (end of first helix), 28 (helix-capping position), and 31 (start of second helix that forms hydrogen bonding with the backbone) (Fig. 3E and SI Appendix, Fig. S12). These positions indicate the importance of polar interactions toward stabilizing our designs and also support that the designs fold into their specific designed structures.
However, our mutational data also revealed some unexpectedly stable mutants (SI Appendix, Fig. S9). For example, we expected that mutants to G23 would be highly destabilizing because G23 should be critical for forming the central β-hairpin. However, in four of the six designs, mutants to G23 could actually increase folding stability (SI Appendix, Fig. S9 B–D and F). To investigate this, we predicted the structures of all mutant sequences using AlphaFold 2 (21). Although most mutants were predicted to have similar structures to the original design, some predictions (including mutants of G23) suggest the possibility of alternative, compact structures (SI Appendix, Fig. S13).
Modeling Relative Contributions of Biophysical Determinants on Folding Stability.
Our previous analysis identified individual determinants of stability without considering how various features relate to each other. Hence, we next analyzed which protein features were the most important contributors to stability and how they compared to each other. Instead of prioritizing predictive accuracy, we used linear regression to build a parsimonious, interpretable, low-resolution model. Our moderately accurate model (r = 0.64, r2 = 0.41; Fig. 2H and SI Appendix, Table S3) included 10 features chosen for either their large individual contributions to stability or their biophysical interest. Adding all 25 additional Rosetta energy terms provides only a minimal improvement to this low-resolution model (SI Appendix, Table S4).
To analyze the strengths of the different features, we compared the different coefficients both in terms of their importance within our dataset (e.g., the impact of a one SD change in each term; Fig. 2 I, Left) and in terms of their biophysical strength (e.g., the impact of one additional residue, contact, charge, etc.; Fig. 2 I, Right). By representing the features in these two ways, we were able to observe how each feature contributes to a design’s stability while holding all other features constant. Relative to the variance in the features, the count of large nonpolar residues is the largest contributor to folding stability (Fig. 2I). Additional biophysical determinants known to stabilize globular proteins (1, 28–30), such as contacts between adjacent nonpolar residues and Ser/Thr helix capping, contribute to folding stability as well (Fig. 2I). However, our model also points to the stabilizing role of nonpolar residues at the design ends, which is a feature specific to the αββα topology (Fig. 2I). Whereas previous studies on the relative importance of stability determinants were based on assays that changed one feature on individual proteins (31, 32), our large-scale testing enabled us to analyze over a thousand protein features on several thousand proteins in parallel. This, in turn, allowed us to develop a model that offers criteria for designing even more stable αββα miniproteins.
Designing αββα Miniproteins Using a Diversity-Oriented Design Strategy.
Our restricted-design strategy (Round 5) focused on improving the success rate of designing stable αββα miniproteins but at the cost of reducing their structural diversity. Because we were now able to successfully generate stable αββα designs, we next investigated whether we could loosen the design restrictions that we had imposed, increase the diversity of our αββα miniproteins, and identify additional determinants of stability. Hence, we designed a new round of “diversity-oriented” (Round 6) αββα miniproteins based on 14 different protein architectures instead of one. This allowed designs to have a greater variety of helix, β-strand, and loop lengths, while keeping the overall size of the protein to 43 residues (Fig. 1B). In addition, we did not impose residue restrictions on β-strands or in the middle loop and permitted a greater number of hydrophobic residues.
Importantly, we used our Round 5 stability data to directly reweight the Rosetta energy function. Using ridge regression, we adjusted the weights on the Rosetta energy terms to create the best correlation with our measured Round 5 αββα stabilities, while regularizing the regression to penalize large deviations from the original weights. With this approach, we created three new energy functions labeled “Minor,” “Medium,” and “Heavy” based on how much the weights deviated from the original weights. We used these three energy functions (and the original weights) to design our Round 6 designs (SI Appendix, Fig. S14).
We generated ∼20,000 designs and chose our final set of over 5,000 αββα designs for experimental testing by identifying designs that had the greatest structural diversity, varied sequence identity (no closer than 28/43 residues), and an αββα topology as determined by the computer program PSIPRED (33). Notably, we prioritized structural diversity (SI Appendix, Fig. S2) in our final selection instead of prioritizing the expected success rate. The median sequence identity across all pairs of sequences was 28% (42% if only nonpolar residues are considered) (SI Appendix, Fig. S3). However, the diversity in amino acid composition (overall and nonpolar only) is lower than several known protein domains of similar sizes (SI Appendix, Fig. S4). Each design is named HEEH_KT_rd6_####, in which KT indicates the designer (author K.T.), rd6 indicates these designs constitute a new “Round 6” following the previous rounds of αββα design, and #### is a design number.
Stability Determinants of αββα Designs Based on Diversity-Oriented Design Strategy.
We tested the stabilities of our “diversity-oriented” αββα miniproteins (and matching scrambled sequences) using the high-throughput protease sensitivity assay (15). Surprisingly, 12% of our scrambled sequences had stability scores above 1, compared to 2% or fewer in previous rounds (Fig. 4A). We further found that scrambled sequences were most likely to be stable when they were very hydrophobic and when their sequences had high helical propensity as determined by DSSP (34, 35) (SI Appendix, Fig. S15). This suggested that designed sequences might also be stabilized by these properties alone, even if they did not fold into their designed structures. To remove these potential “false positive” designs from our analysis, we restricted our analysis to designs with a lower nonpolar residue count and lower helical propensity (SI Appendix, Fig. S15). Restricting our analysis in this way removed 25% of our total designs, while lowering the fraction of stable scrambles from 12% to 6% (Fig. 4B). The overall fraction of stable designs was 26%—still substantially above the “success rate” of the scrambled sequences (Fig. 4B).
Fig. 4.
Experimental testing and analysis of αββα stability determinants from a diversity-oriented design strategy. (A) Stability score distribution of αββα miniproteins (green) and scrambled sequences (gray). (B) As in A, filtered to eliminate designed and scrambled sequences that may fold into nondesigned structures; see text and SI Appendix, Fig. S7. The vertical line at stability score = 1 denotes the threshold above which we consider a design to be stable. (C and D) Stability scores and success frequencies of designs made with differently weighted Rosetta energy functions; “Heavy” indicates the largest amount of reweighting. (E) Rosetta scores (using the unmodified score function) of designs made using different weighting; the more positive scores of the designs from the reweighted energy functions indicate these designs are less favorable according to the default energy function. (F) Stability contribution of the most common loop patterns (using ABEGO notation) and β-strand lengths based on a linear regression model. (G) The most common unique structure combinations (loop pattern, β-strand, and helix lengths) are listed (Left) followed by the distribution of observed stability scores (Middle, with the expected stability from the linear regression model as a yellow dot). (Right) The fraction of stable designs for each unique structure. All error bars indicate 95% CIs from bootstrapping.
We then analyzed the impact of differently weighted Rosetta energy functions on folding stability. On average, designs made using the reweighted energy functions had higher stability than designs made with the default energy function (Fig. 4 C and D). However, some regularization (restraining the weights near their original values) was critical to successful reweighting: the “Heavy” energy function, where the changes to the weights were the largest, performed much more poorly than the energy functions with “Minor” and “Moderate” changes to the weights (Fig. 4 C and D). The success of the reweighted energy functions suggests that empirical reweighting could be an efficient practical tool for protein design in situations where large-scale data are available for a specific task. The designs created by the reweighted energy functions would not have been favored under our previous design procedure, with larger changes to the weights leading to designs that appear less and less favorable according to the default energy function (Fig. 4E). These reweighted “Minor” and “Moderate” energy functions also showed better correlation with previously published stabilities for other miniprotein topologies compared to the default score function (SI Appendix, Table S5).
Next, we investigated how topological features (loop, β-strand, helix) of the designs affect folding stability. We selected the seven most common loop structures found in our designs (represented using ABEGO notation) (36) and the three most common β-strand lengths as inputs to another linear regression model (SI Appendix, Fig. S16 and Fig. 4F). The explanatory strength of this model is weak (95% CI from bootstrapping, mean r = 0.167, mean R2 = 0.028). This is due to the simplicity of the model and because the topology-only model excludes critical stability determinants such as hydrophobic residue count. Despite these shortcomings, this model still enables us to examine the relative importance of different topological components. The largest structural contributors to stability are the lengths of β-strands and helices, with shorter β-strands (and corresponding longer helices) as the most favorable topological parameter (β-strand and helix lengths are inversely related because all designs have a fixed length of 43 residues) (Fig. 4F). Secondarily, particular structures in loops 2 and 3 influenced folding stability as well. A loop structure of GBB in the first loop, GG in the second loop, and AB in the third loop increases the stability of a design more than other loop structures (Fig. 4F).
Based on this topology-focused model, we would expect αββα miniproteins with a GBB-GG-AB loop patterning, β-strands that are four residues long, and helices that are 14-residues long to be more stable on average than αββα miniproteins with any other loop, strand, and helix combination (Fig. 4F). Although designs with a β-strand length of four residues were not common in our dataset, a very similar design structure (GBB-GG-AB with a β-strand length of three residues) had the highest average stability score and the highest success rate in our dataset (Fig. 4G), which is in agreement with a previous study on loop patterning and stability (37). In fact, this design pattern is the protein architecture that we used to generate all the Round 5 αββα miniproteins (SI Appendix, Fig. S1A). However, the high success of this architecture in Round 6 may be due to using reweighted energy functions that were optimized based on Round 5 designs with this specific architecture. Nonetheless, when we subset our Round 6 designs to identify αββα miniproteins with a GBB-GG-AB loop pattern and features that we previously determined to promote stability, these designs are diverse in their sequence identity and highly stable (81% successful) (Fig. 5). This provides a “recipe” for designing new stable αββα miniproteins in the future.
Fig. 5.
A recipe for building diverse high-stability αββα designs. (A) Designs made from a diversity-oriented strategy are grouped into subsets based on five features that we identified to be important for stability (Figs. 2I and 4F). (B) The number of designs that comprise each subset. (C) The mean sequence identity between any two designs in each subset. (D) The fraction of successful designs in each subset, with error bars indicating 95% CIs from bootstrapping. Ideal designs (those with the parameters of Subset 5) are 80% successful with under 40% sequence identity between pairs of designs.
Predicting Stable De Novo αββα Miniproteins by AlphaFold 2.
When we designed and tested αββα miniproteins for their folding stability, AlphaFold 2 was not yet available. With its recent release (21), we wondered whether AlphaFold 2 could discriminate between stable and unstable miniproteins. We explored this possibility even though AlphaFold 2 is intended for structure prediction and not stability prediction. Out of the ∼5,600 and ∼4,000 restricted and diversity-oriented designs, respectively, we found that 78% of the former and 20% of the latter had at least one predicted structure within 2 Å RMSD to the designed model. These predictions were equally in agreement with design models regardless of whether a design was experimentally unstable, moderately stable, or stable, indicating that AlphaFold 2 did not discriminate stable from unstable designs (SI Appendix, Fig. S17). We also examined whether the Rosetta energy scores of the AlphaFold 2–predicted models were better correlated with experimental stability scores than the scores of the original design models. The AlphaFold 2–predicted models did not improve the correlation with experiment for the Round 5 design set but provided a small improvement for Round 6 (SI Appendix, Fig. S18 A–D). Neither RMSD nor AlphaFold 2’s average confidence measure (pLDDT) showed much ability to enrich for stable designs (SI Appendix, Fig. S18E), indicating that AlphaFold 2 is currently unable to determine the folding stability of these designed miniproteins.
Discussion
Understanding the biophysical determinants that enable proteins to fold and remain stable is important in protein design, drug development, and other areas. Here, we examined the stability determinants of the αββα miniprotein fold, which was previously identified as unusually challenging to design (15). We took advantage of an improved Rosetta design protocol (23, 24) to design over 10,000 αββα miniproteins using both restrictive and diversity-oriented design strategies. Our two design strategies led to over 3,000 new stable designs (∼2,100 restricted and ∼1,000 diversity-oriented designs) and a much higher success rate (38%; Fig. 2A) than the 2% success previously reported (15). Our designed proteins also had a much higher success rate than control sequences with identical residue composition and polar–nonpolar patterning. This suggests that their stability was conferred by their designed three-dimensional structures. Supporting this, NMR structures of two designs closely matched the designed models (below 2 Å backbone RMSD; Fig. 3), CD spectra of six designs were consistent with the designed structures (Fig. 3C), and deep mutational scanning analysis of 5/6 designs showed specific sequence–stability relationships that were consistent with the designed structures. However, the lower resolution of CD and mutational scanning cannot directly demonstrate the atomic accuracy of the designs.
Our large dataset of stable designs enabled us to quantify determinants of stability for the previously challenging αββα fold (Figs. 2I and 4 F and G). Most of the stability determinants were common across globular proteins (1, 28, 29, 32, 37–40) and similar to those previously observed in large-scale de novo design experiments (15). We also identified that designing hydrophobic residues near the termini was especially important for the αββα miniprotein fold (Fig. 2I). Our design success rate improved substantially when we used our large dataset to reweight the Rosetta energy function specifically for αββα design (Fig. 4 C–E). These observations largely explain the low success of previously designed αββα proteins: Previous designs frequently employed nonoptimal loop patterns, helix capping residues, and residues near the design termini and typically had 13 to 16 nonpolar residues rather than the 17 to 20 used here (SI Appendix, Fig. S1). Notably, the total number of nonpolar residues in each design is influenced by the design energy function and by parameters that restrict the amino acids that are sampled at each position according to the solvent accessibility of that position (15, 41). These restrictions are manually tuned to balance stability and solubility, as well as to reduce the search space of sequences. Designing proteins with too few nonpolar residues can thus be considered a failure of manual tuning as well as a failure of the design energy function.
Our study has several notable limitations. First, some fraction of “stable” designs are likely stable for nondesigned reasons, such as folding into an alternative structure, forming a compact “molten” state, or aggregating on the surface of yeast. In our diversity-oriented set, 6% of our scrambled sequences met our stability threshold, compared with 25% of designs (Fig. 4B). Naively, this suggests that one in five stable designs could be stable for nondesigned reasons. In addition, three of the six designs based on the restrictive protocol exhibited some oligomeric species when purified from E. coli (Fig. 3B), suggesting designs might be stabilized by intermolecular interactions. Because our regression analysis assumes that each design’s stability (or lack thereof) is due to its designed monomeric structure, our analysis will be unreliable if nondesigned structures or interactions played an important role in our observed stabilities. Still, our regression analysis was able to identify specific three-dimensional features as stabilizing or destabilizing, such as buried unsatisfied polar atoms and attractive or repulsive ion pairs (Fig. 2I).
Second, our findings regarding the determinants of stability are limited to the specific context we examined: αββα miniproteins designed by a particular computational procedure. The samples of designs that we tested were not random: They were designed to be high-stability and showed variation across some dimensions but not others. If a biophysical property (such as backbone torsional strain or higher polarity) varied only minimally across our design set, we would not be able to identify the contribution of that feature to stability. An alternative design procedure might also generate structures in a different region of “property space,” permitting high-stability designs that are different from the recipe described in Fig. 5. Constructing a fully general model of folding stability will ultimately require a broad sampling of sequences, structures, and biophysical properties. Our work here investigating a specific design space suggests that this should be possible.
Despite these limitations, our study demonstrates how large-scale experimental testing can be applied to solve a challenging design problem and to quantify the biophysical features that influence design stability. In contrast to other studies that use mutagenesis to study determinants of folding stability (19, 42–44), our method examines the strengths of different biophysical features across thousands of different protein contexts, although these contexts are all related by the αββα fold and design procedure. Simplified low-resolution models like our linear regression are valuable for building biophysical intuition about the strengths of different interactions (45, 46) as well as for guiding the construction of high-resolution models like the Rosetta energy function, which is also an additive model (47). Our stable αββα designs (and our recipe for generating more) may also be valuable scaffolds for engineering binding functionality for therapeutic, diagnostic, and synthetic biology applications (12, 48, 49).
Materials and Methods
αββα miniproteins were designed using Rosetta, based on our previous work (15), the Rosetta protocol FastDesign (24), the beta_nov16_protease version of the full-atom energy function, and a recently improved sampling method (50). We purchased DNA oligo libraries from Agilent. DNA amplification, yeast display-based proteolysis, sorting, next-generation sequencing, and calculating the “stability score” for each miniprotein were all performed as described previously (15). For CD, NMR, and HDX analysis, we purchased six miniprotein from Twist Bioscience, expressed them in BL21(DE3) competent cells, and purified by nickel-column affinity chromatography and size-exclusion chromatography. See SI Appendix for detailed computational and experimental methods. All files and datasets are provided at https://github.com/kimte1/abba_protein_stability_manuscript.
Supplementary Material
Acknowledgments
This work was supported, in part, by the National Institute of General Medical Sciences through award number 1DP2GM140927 and award number 5T32GM105538. K.T. was supported by Japan Society for the Promotion of Science KAKENHI grant number (19J30003), JST PRESTO Grant JPMJPR21E9, and is currently supported by a Human Frontier Science Program Long-Term Fellowship. Computational work was supported in part through the resources and staff contributions provided for the Quest high performance computing facility at Northwestern University (which is jointly supported by the Office of the Provost, the Office for Research, and Northwestern University Information Technology). NMR work was performed by the Structural Genomics Consortium, a registered charity (no. 1097737) that receives funds from Bayer AG, Boehringer Ingelheim, Bristol Myers Squibb, Genentech, Genome Canada through Ontario Genomics Institute (OGI-196), EU/EFPIA/OICR/McGill/KTH/Diamond Innovative Medicines Initiative 2 Joint Undertaking (EUbOPEN grant 875510), Janssen, Merck KGaA (aka EMD in Canada and United States), Pfizer, and Takeda. Yeast display selections and next-generation sequencing were performed by the University of Washington BioFab. CD spectroscopy was performed using Northwestern University’s Keck Biophysics Facility. We thank the members of the G.J.R. laboratory for discussions and comments on this manuscript.
Footnotes
The authors declare no competing interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2122676119/-/DCSupplemental.
Data, Materials, and Software Availability
Files and datasets are provided at https://github.com/kimte1/abba_protein_stability_manuscript (51). All other study data are included in the article and/or SI Appendix. NMR structures are deposited in the Protein Data Bank (PDB) (accession codes: 7T2F and 8DOA) (52, 53), and in the Biological Magnetic Resonance Data Bank (BMRB) (accession codes: 30974 and 31033) (54, 55).
References
- 1.Dill K. A., Dominant forces in protein folding. Biochemistry 29, 7133–7155 (1990). [DOI] [PubMed] [Google Scholar]
- 2.Goldenzweig A., Fleishman S. J., Principles of protein stability and their application in computational design. Annu. Rev. Biochem. 87, 105–129 (2018). [DOI] [PubMed] [Google Scholar]
- 3.Arai M., Unified understanding of folding and binding mechanisms of globular and intrinsically disordered proteins. Biophys. Rev. 10, 163–181 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Huang P.-S., Boyken S. E., Baker D., The coming of age of de novo protein design. Nature 537, 320–327 (2016). [DOI] [PubMed] [Google Scholar]
- 5.Boyken S. E., et al. , De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science 352, 680–687 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pan X., et al. , Expanding the space of protein geometries by computational design of de novo fold families. Science 369, 1132–1136 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Goldenzweig A., et al. , Automated structure- and sequence-based design of proteins for high bacterial expression and stability. Mol. Cell 63, 337–346 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wiese J. G., Shanmugaratnam S., Höcker B., Extension of a de novo TIM barrel with a rationally designed secondary structure element. Protein Sci., 10.1002/pro.4064 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lalaurie C. J., et al. , The de novo design of a biocompatible and functional integral membrane protein using minimal sequence complexity. Sci. Rep. 8, 14564 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Nisthal A., Wang C. Y., Ary M. L., Mayo S. L., Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc. Natl. Acad. Sci. U.S.A. 116, 16367–16377 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Broom A., Trainor K., Jacobi Z., Meiering E. M., Computational modeling of protein stability: Quantitative analysis reveals solutions to pervasive problems. Structure 28, 717–726.e3 (2020). [DOI] [PubMed] [Google Scholar]
- 12.Chevalier A., et al. , Massively parallel de novo protein design for targeted therapeutics. Nature 550, 74–79 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Brini E., Simmerling C., Dill K., Protein storytelling through physics. Science 370, eaaz3041 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bryan C. M., et al. , Computational design of a synthetic PD-1 agonist. Proc. Natl. Acad. Sci. U.S.A. 118, e2102164118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rocklin G. J., et al. , Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Dou J., et al. , De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Basanta B., et al. , An enumerative algorithm for de novo design of proteins with diverse pocket structures. Proc. Natl. Acad. Sci. U.S.A. 117, 22135–22145 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Linsky T., et al. , Sampling of structure and sequence space of small protein folds. bioRxiv [Preprint] (2021). https://www.biorxiv.org/content/10.1101/2021.03.10.434454v1 (Accessed 18 March 2021).
- 19.Singer J. M., et al. , Large-scale design and refinement of stable proteins using sequence-only models. PLoS One 17, e0265020 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Maxwell K. L., et al. , The solution structure of bacteriophage lambda protein W, a small morphogenetic protein possessing a novel fold. J. Mol. Biol. 308, 9–14 (2001). [DOI] [PubMed] [Google Scholar]
- 21.Jumper J., et al. , Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Huang P.-S., et al. , RosettaRemodel: A generalized framework for flexible backbone protein design. PLoS One 6, e24109 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Park H., et al. , Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules. J. Chem. Theory Comput. 12, 6201–6212 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Maguire J. B., et al. , Perturbing the energy landscape for improved packing during computational protein design. Proteins 89, 436–449 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Monera O. D., Sereda T. J., Zhou N. E., Kay C. M., Hodges R. S., Relationship of sidechain hydrophobicity and α-helical propensity on the stability of the single-stranded amphipathic α-helix. J. Pept. Sci. 1, 319–329 (1995). [DOI] [PubMed] [Google Scholar]
- 26.Huyghues-Despointes B. M. P., Scholtz J. M., Pace C. N., Protein conformational stabilities can be determined from hydrogen exchange rates. Nat. Struct. Biol. 6, 910–912 (1999). [DOI] [PubMed] [Google Scholar]
- 27.Negin R. S., Carbeck J. D., Measurement of electrostatic interactions in protein folding with the use of protein charge ladders. J. Am. Chem. Soc. 124, 2911–2916 (2002). [DOI] [PubMed] [Google Scholar]
- 28.Serrano L., Fersht A. R., Capping and α-helix stability. Nature 342, 296–299 (1989). [DOI] [PubMed] [Google Scholar]
- 29.Wan W.-Y., Milner-White E. J., A recurring two-hydrogen-bond motif incorporating a serine or threonine residue is found both at α-helical N termini and in other situations. J. Mol. Biol. 286, 1651–1662 (1999). [DOI] [PubMed] [Google Scholar]
- 30.Nick Pace C., Scholtz J. M., Grimsley G. R., Forces stabilizing proteins. FEBS Lett. 588, 2177–2184 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pace C. N., et al. , Contribution of hydrogen bonds to protein stability. Protein Sci. 23, 652–661 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Pace C. N., et al. , Contribution of hydrophobic interactions to protein stability. J. Mol. Biol. 408, 514–528 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Jones D. T., Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999). [DOI] [PubMed] [Google Scholar]
- 34.Kabsch W., Sander C., Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983). [DOI] [PubMed] [Google Scholar]
- 35.Touw W. G., et al. , A series of PDB-related databanks for everyday needs. Nucleic Acids Res. 43, D364–D368 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wintjens R. T., Rooman M. J., Wodak S. J., Automatic classification and analysis of α α-turn motifs in proteins. J. Mol. Biol. 255, 235–253 (1996). [DOI] [PubMed] [Google Scholar]
- 37.Lin Y.-R., et al. , Control over overall shape and size in de novo designed proteins. Proc. Natl. Acad. Sci. U.S.A. 112, E5478–E5485 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Chakrabartty A., Doig A. J., Baldwin R. L., Helix capping propensities in peptides parallel those in proteins. Proc. Natl. Acad. Sci. U.S.A. 90, 11332–11336 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kurnik M., Hedberg L., Danielsson J., Oliveberg M., Folding without charges. Proc. Natl. Acad. Sci. U.S.A. 109, 5705–5710 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Gavrilov Y., Dagan S., Levy Y., Shortening a loop can increase protein native state entropy. Proteins 83, 2137–2146 (2015). [DOI] [PubMed] [Google Scholar]
- 41.Koga N., et al. , Principles for designing ideal protein structures. Nature 491, 222–227 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Baker E. G., et al. , Engineering protein stability with atomic precision in a monomeric miniprotein. Nat. Chem. Biol. 13, 764–770 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Trotter D., Wallin S., Effects of topology and sequence in protein folding linked via conformational fluctuations. Biophys. J. 118, 1370–1380 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Marin F. I., Johansson K. E., O’Shea C., Lindorff-Larsen K., Winther J. R., Computational and experimental assessment of backbone templates for computational redesign of the thioredoxin fold. J. Phys. Chem. B 125, 11141–11149 (2021). [DOI] [PubMed] [Google Scholar]
- 45.Bellesia G., Jewett A. I., Shea J. E., Relative stability of de novo four-helix bundle proteins: insights from coarse grained molecular simulations. Protein Sci. 20, 818–826 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Ha-Duong T., “Coarse-grained models of the proteins backbone conformational dynamics” in Protein Conformational Dynamics, Han K., Zhang X., Yang M., Eds. (Advances in Experimental Medicine and Biology, Springer International Publishing, 2014), pp. 157–169. [DOI] [PubMed] [Google Scholar]
- 47.Alford R. F., et al. , The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Cao L., et al. , De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science 370, 426–431 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Cao L., et al. , Design of protein-binding proteins from the target structure alone. Nature 605, 551–560 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Pavlovicz R. E., Park H., DiMaio F., Efficient consideration of coordinated water molecules improves computational protein-protein and protein-ligand docking discrimination. PLOS Comput. Biol. 16, e1008103 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Kim T. E., et al. , Data for “Dissecting the stability determinants of a challenging de novo protein fold using massively parallel design and experimentation.” GitHub. https://github.com/kimte1/abba_protein_stability_manuscript. Deposited 5 August 2022. [DOI] [PMC free article] [PubMed]
- 52.Kim T. E., et al. , 7T2F, Solution structure of the model HEEH mini protein homodimer HEEH_TK_rd5_0341. PDB. https://www.rcsb.org/structure/unreleased/7T2F. Deposited 4 December 2021.
- 53.Kim T. E., et al. , 8DOA, Solution structure of a model HEEH mini-protein (HEEH_TK_rd5_0958). PDB. https://www.rcsb.org/structure/unreleased/8DOA. Deposited 12 July 2022.
- 54.Kim T. E., et al. , Data for “Dissecting the stability determinants of a challenging de novo protein fold using massively parallel design and experimentation.” BMRB. https://legacy.bmrb.io/data_library/summary/index.php?bmrbId=30974. Accessed 22 September 2022. [DOI] [PMC free article] [PubMed]
- 55.Kim T. E., et al. , Data for “Dissecting the stability determinants of a challenging de novo protein fold using massively parallel design and experimentation.” BMRB. https://legacy.bmrb.io/data_library/summary/index.php?bmrbId=31033. Accessed 22 September 2022. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Files and datasets are provided at https://github.com/kimte1/abba_protein_stability_manuscript (51). All other study data are included in the article and/or SI Appendix. NMR structures are deposited in the Protein Data Bank (PDB) (accession codes: 7T2F and 8DOA) (52, 53), and in the Biological Magnetic Resonance Data Bank (BMRB) (accession codes: 30974 and 31033) (54, 55).





