Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2024 Aug 15;121(34):e2321999121. doi: 10.1073/pnas.2321999121

The importance of the location of the N-terminus in successful protein folding in vivo and in vitro

Natalie R Dall a,b, Carolina A T F Mendonça b, Héctor L Torres Vera a, Susan Marqusee a,b,c,1
PMCID: PMC11348275  PMID: 39145938

Significance

Proteins can fold cotranslationally, while still a ribosome nascent chain. This vectorial nature of protein synthesis can impact soluble, functional protein expression and interactions with proteostasis factors. To probe the importance of the N-terminus in this process, we generated all possible circular permutants (proteins with the same sequence and an altered start site) of a protein and monitored the changes in folding and soluble expression. We find that function, folding, and solubility are impacted most by termini insertion in the fast-folding core of the protein, especially regions with low mutational tolerance and high evolutionary coupling. These changes are robust to deletion of specific proteostasis factors. The structural details of the folding intermediate can also be altered by circular permutation.

Keywords: protein folding, circular permutation, cotranslational folding, aggregation

Abstract

Protein folding in the cell often begins during translation. Many proteins fold more efficiently cotranslationally than when refolding from a denatured state. Changing the vectorial synthesis of the polypeptide chain through circular permutation could impact functional, soluble protein expression and interactions with cellular proteostasis factors. Here, we measure the solubility and function of every possible circular permutant (CP) of HaloTag in Escherichia coli cell lysate using a gel-based assay, and in living E. coli cells via FACS-seq. We find that 78% of HaloTag CPs retain protein function, though a subset of these proteins are also highly aggregation-prone. We examine the function of each CP in E. coli cells lacking the cotranslational chaperone trigger factor and the intracellular protease Lon and find no significant changes in function as a result of modifying the cellular proteostasis network. Finally, we biophysically characterize two topologically interesting CPs in vitro via circular dichroism and hydrogen–deuterium exchange coupled with mass spectrometry to reveal changes in global stability and folding kinetics with circular permutation. For CP33, we identify a change in the refolding intermediate as compared to wild-type (WT) HaloTag. Finally, we show that the strongest predictor of aggregation-prone expression in cells is the introduction of termini within the refolding intermediate. These results, in addition to our finding that termini insertion within the conformationally restrained core is most disruptive to protein function, indicate that successful folding of circular permutants may depend more on changes in folding pathway and termini insertion in flexible regions than on the availability of proteostasis factors.


In the cell, most proteins have their first opportunity to fold during their synthesis via translation. Protein synthesis occurs orders of magnitude slower than the rate of formation of α-helices and β-strands, which allows for sampling of various conformations as the nascent polypeptide chain emerges from the ribosome (15). Recent studies of this cotranslational process have demonstrated secondary structure formation within the ribosome exit tunnel and during translation (cotranslational folding) (610). Therefore, this vectorial nature of protein synthesis from N to C terminus generates a potential bias in the conformations available to the growing nascent chain, with the ability to influence the overall folding trajectory and efficiency (1114). Several proteins have been shown to fold more efficiently (producing more soluble product) cotranslationally as compared to refolding in vitro (15). For example, firefly luciferase and many versions of Green Fluorescent Protein (GFP) are both highly aggregation-prone during refolding, yet efficiently produce soluble protein when translated in vitro or when overexpressed in cells (1618).

In vivo, the folding process is also modulated by cellular factors—chaperones and other parts of the proteostasis network (19). For example, soluble refolding of firefly luciferase can be rescued by introducing the bacterial chaperone system DnaK/DnaJ/GrpE to in vitro refolding experiments (20). Chaperones such as trigger factor and DnaK/DnaJ (HSP70 and HSP40 in mammalian systems) can interact with nascent chains to block the formation of nonproductive, kinetically trapped intermediates during cotranslational and posttranslational folding (2123). Protein expression can also be modulated by translation speed, as changing the rate of translation has been seen to impact expression yields and folding pathways (2426). Thus, folding in the cell is highly dependent on several factors external to the sequence of the growing polypeptide chain. Despite this, we know very little about the importance of which part of the protein is translated first (that is, where the N-terminus resides within the overall fold of the protein).

The order of protein translation can be modified via circular permutation. Circular permutants (CPs) are created, often at the DNA level, by connecting the N and C termini with a flexible amino acid linker and introducing the termini elsewhere in the protein (2729). For proteins where the N and C termini are close in the three-dimensional structure, the circular permutant is often structurally similar to the wild-type protein and often retains function. It is important to note that although the native structure may be resilient to permutation, the energy landscape of the protein may be drastically perturbed. For instance, circular permutation has been observed to affect aggregation propensity, quaternary structure, catalytic activity, folding, and even proteolytic susceptibility in different cell types (3034). Importantly, CPs have the potential to form novel cotranslational folding intermediates not observed in the wild-type protein. Thus, permutation has the potential to bias the folding pathway. This may alter protein interactions with chaperones and other members of the proteostasis network, although to date, we are unaware of any experimental studies looking at the effects of chaperones on circular permutant folding.

In previous studies, work from our lab demonstrated that cotranslational folding of the protein HaloTag is less aggregation-prone than when refolded in vitro from chemical denaturant (35). This cotranslational folding was monitored in an in vitro purified translation system (PURExpress®) in the absence of any cellular factors such as chaperones or proteases. HaloTag is a 297-residue bacterial haloalkane dehalogenase engineered to covalently bind halogenated ligands (36). HaloTag contains a core subdomain made up of an α/β sheet with eight β-strands connected by α-helices, and a helical lid subdomain inserted between β6 and β7 (37, 38) (Fig. 1A). When refolded from denaturant, HaloTag populates an aggregation-prone refolding intermediate with structural elements from both N- and C-terminal regions (35). These results suggest that the vectorial nature of translation may favor local structure formation by the nascent chain and thereby avoid the formation of this aggregation-prone partially folded structure, as the C-terminal intermediate residues will be sequestered in the ribosome until translational termination is complete.

Fig. 1.

Fig. 1.

Circular permutation of HaloTag impacts function and solubility in E. coli. (A) Left: Structure of HaloTag bound to the ligand TMR (yellow sticks, PDB: 6u32). Right: Secondary structure topology map of HaloTag. Helices shown in dark blue, β-strands shown in light blue. Spheres: native termini (black), CP33 (orange), CP36 (dark pink), CP39 (teal), and CP217 (purple). Yellow sphere indicates D106, the catalytic residue covalently attached to TMR. (B) Representative gel-based assay examining solubility and function of HaloTag CPs. CPs showing TMR fluorescence in top gel image are functional. CPs showing equal amounts of protein in the supernatant fraction (S) as the whole cell lysate (L) are soluble, while CPs less-abundant in the supernatant are aggregation-prone. Black arrows show CP bands. (C) Locations of CPs with high levels of function (pink), low levels of function (purple), or no functional activity (black). (D) Locations of the most aggregation-prone CPs. Teal CPs are highly aggregation-prone and highly functional, medium blue CPs are highly aggregation-prone with low functional levels, and dark blue CPs are aggregation-prone and nonfunctional. (E) Gel quantification scores for every CP. An one-dimensional topology map is shown above, with colored arrows indicating positions of CPs highlighted in (A and B). Functional scores are calculated based on the relative amounts of TMR fluorescence:protein staining as quantified in ImageJ. Lines indicate the WT score (-), 20% of WT (--), and 1% of WT (••). We define functioning CPs as those with scores above 20% of WT (pink), low-function CPs have scores between 1% and 20% of the WT score (purple), and nonfunctioning CPs have scores of 0 to 1% of the WT level (black or no bar). Solubility scores represent the ratio of protein in the supernatant and lysate fractions, are normalized to WT HaloTag, and log2-transformed, so scores closer to zero indicate higher solubility. Lines indicate the WT score (-) and 33% of WT (-•-). CPs with scores below the 33% WT line (navy) are highly aggregation-prone.

Here, we have taken a comprehensive approach to evaluate the effect of circular permutation on a protein, its ability to fold and function, and its reliance on the translational machinery and cellular factors when expressed in cells. We constructed all 297 possible circular permutants of HaloTag and evaluated their function in multiple cellular environments. To quantify solubility and function, we use both a biochemical gel assay with Escherichia coli cell lysate and a high-throughput screen of protein function via fluorescence-activated cell sorting coupled with next-generation sequencing (FACS-seq). To evaluate the role of cellular factors, specifically the cotranslational chaperone trigger factor and the intracellular protease Lon, we carried out the FACS-seq experiments using E. coli cells lacking these specific factors. Finally, we selected two circular permutants for detailed biophysical characterization to evaluate the effects of termini relocation on structure, folding, and stability (the energy landscape). Combining these results allows us to identify trends that correlate with the function and solubility of protein CPs, which carry implications for protein engineering, termini placement in protein design, and the evolution of complex protein structures.

Results

Circular Permutation of HaloTag Differentially Impacts Protein Solubility and Function.

To evaluate the role of circular permutation in protein folding, we generated a library of 297 plasmids encoding all potential circular permutants of the protein HaloTag (SI Appendix, Fig. S1A). For each permutant, an eight-residue linker [GT(GS)3] was appended to the C-terminus of the wild-type protein and then connected to residue 1 of the wild-type sequence. Each circular permutant is identified by the wild-type residue that is the new N-terminus of the protein. For example, CP33 refers to the circular permutant starting at residue 33, and CP1 refers to the wild-type sequence with the added C-terminal GS linker.

We developed a simple gel-based assay to assay the effect of circular permutation on folding, function, and the propensity to aggregate in E. coli cell lysate (SI Appendix, Fig. S1B). This assay takes advantage of the fact that HaloTag is an engineered haloalkane dehalogenase that requires folding and formation of the active site to covalently bind halogenated ligands (Fig. 1A), and therefore irreversible binding of the fluorescent HaloTag tetramethylrhodamine ligand (TMR) can be monitored as a readout of protein function. In brief, whole-cell lysates (freeze-thaw lysis) and the clarified lysates are stained with TMR, separated by SDS-PAGE, and then analyzed by a) TMR fluorescence to quantify function and b) Coomassie staining to quantify the amount of soluble protein (Fig. 1B and SI Appendix, Fig. S1 CF). While several CPs showed high solubility and functional levels (e.g., CP217, Fig. 1B), some functional CPs were aggregation-prone in cells. For example, CP33 is less abundant in the soluble fraction than in the whole cell lysate, but protein in the soluble fraction can nevertheless bind TMR (Fig. 1B). Other CPs such as CP36 and CP39 are nonfunctional, and show different levels of aggregation propensity (Fig. 1B).

Function and solubility scores were assigned for every circular permutant (Fig. 1 CE, see Dataset S1 for raw scores). In total, 78% of HaloTag CPs can function in E. coli cell lysate (Fig. 1C), with 180 CPs retaining a high level of TMR binding and another 53 CPs showing low levels of TMR-binding (SI Appendix, Fig. S2A, CP44 vs. CP45). CPs with termini introduced in the lid or in core loops generally retain functional activity, while core CPs with termini introduced within β-strands or breaking helices are often nonfunctional (Fig. 1 C and E). Interestingly, when cells were lysed with a detergent-based protein extraction reagent (BugBuster) as compared to freeze-thaw, 118 CPs (mostly in the lid region) showed a decrease in function and change in category, suggesting the detergents present in this lysis buffer could interfere with proper irreversible TMR-binding (SI Appendix, Figs. S2 and S3). 90 CPs are considered highly aggregation-prone, and have termini introduced within core β-strands or within core helices (Fig. 1 D and E). In sum, these results show that circular permutation can differentially affect both solubility and function of HaloTag in E. coli, with the core subdomain showing low tolerance to CP with respect to soluble, functional protein expression.

We wondered whether the CPs that aggregate into inclusion bodies in our cell lysates might fold more efficiently during in vitro refolding. Therefore, we selected aggregation-prone CPs to determine whether functional protein could be recovered from these inclusion bodies (SI Appendix, Fig. S4A). We first chose CP119 and CP277, which both form functional proteins, but show low levels of soluble expression in E. coli. For CP119, the insoluble fraction did not solubly refold, while CP227 did (SI Appendix, Fig. S4 B and C), indicating that CP119 only folds efficiently during cotranslational folding, while CP277 folds potentially more efficiently during refolding than during cotranslational folding. This experiment was repeated with 32 other aggregation-prone CPs: four proteins could refold from inclusion bodies and bind TMR (SI Appendix, Fig. S4 C and D, blue), four produced nonfunctional soluble protein (SI Appendix, Fig. S4 C and D, orange), and the other 24 did not refold solubly (SI Appendix, Fig. S4 C and D, black).

In order to easily and rapidly assay protein function in different cell types, we developed a high-throughput approach, transforming the CP library in living cells, staining with TMR and sorting via FACS-seq based on in-cell TMR fluorescence. We first screened the HaloTag CP library in E. coli BL21 (DE3) cells (the same strain as used in gel analyses) (SI Appendix, Fig. S5) and then evaluated strains lacking specific proteostasis factors (see below). To quantify which CPs were functional in each cell type, we calculated an enrichment score for the high-fluorescence cell populations. It is important to note that these FACS-seq readouts only report on the overall level of TMR fluorescence and cannot distinguish subtleties that may arise from differences in protein levels vs. differences in protein function. Therefore, these functional scores represent a combination of protein function, solubility, and expression in cells. We used the DESeq2 RNA-sequencing analysis package (39), which calculates log-fold changes between the unsorted and sorted populations and makes score adjustments based on the depth of the sequencing reads for each variant. Fig. 2A shows the calculated enrichment scores for each CP, with yellow boxes drawn to mark contiguous stretches of functional CPs in the BL21 FACS-seq dataset; CPs with scores greater than zero are considered functional. As anticipated, we see agreement between our in vitro gel assay functional scores and the enrichment scores calculated from our BL21 FACS-seq dataset (Fig. 2B, corrected two-sided P-values in SI Appendix, Fig. S6 (40), Pearson 0.52 < r < 0.54). Given the indirect nature of the FACS-seq assay, the detailed biochemical analysis is likely the more accurate assessment of protein function.

Fig. 2.

Fig. 2.

FACS-seq data for the CP library in different E. coli strains and correlations between functional scores and structural parameters. (A) FACS-seq log-fold change enrichment scores for the CP library screened in BL21, BW25113, Δtig, and Δlon cells. Enrichment scores are plotted on a log2 scale and error bars show the score ± the SE obtained from DESeq2 analyses. CPs with scores >0 are functional. Yellow boxes are drawn based on contiguous stretches of functional CPs in the BL21 dataset. Orange and purple arrows indicate positions of CP33 and CP217, respectively. (B) Correlations between FACS, gel scores, and structural parameters. Correlation scores with a Bonferroni-corrected P-value < 0.05 are considered nonsignificant (n.s., white boxes, see SI Appendix, Fig. S6 for two-sided P-values). FT gel, freeze-thaw lysis condition gel assays; DB gel, detergent-based lysis condition gel assays. (C) Summary of CPs with high- or low-functional across all FACS datasets. CPs with consistent scores >0 are shown in pink, while CPs with scores <0 are shown in purple.

CP Function Is Highly Robust in Multiple Cellular Environments.

In the cell, protein folding is often dependent on the cellular environment and specific proteostasis factors. The above high-throughput assay allows us to ask how important these cellular factors are in the successful folding of circular permutants. Because circular permutation changes which region of a protein is synthesized first during translation, we expected that CP nascent chains could differentially interact with the cotranslational chaperone trigger factor, potentially impacting functional levels in cells. We were also curious to see whether abundance of Lon protease in the cell could impact the function of CPs prone to misfolding or aggregation, as Lon is deleted in BL21 expression strains and is therefore absent in the cellular environment where we carried out our gel assays. We therefore screened our library of HaloTag CPs in three additional cellular environments: the Keio collection trigger factor- and Lon-deletion strains (Δtig and Δlon respectively), in addition to the Keio parent E. coli strain BW25113 (41) (There are additional genetic differences between the BL21 and Keio collection strains, and therefore the knockout data should be directly compared to BW25113). Fig. 2 shows that many CPs retain high (Fig. 2C, pink residues) or low (Fig. 2C, purple residues) levels of function across all cell types. When comparing these CP functional data in the Δtig and Δlon deletion strains with the BW25113 parent strain, we identify subsets of CPs that show changes in function in these altered cellular environments. However, these changes are not statistically significant (Wald test P-value > 0.05, SI Appendix, Fig. S7). Thus, perhaps surprisingly, this analysis suggests that the in vivo determinants of functional protein folding after relocating the N-terminus in HaloTag are independent of these cellular factors.

High Levels of CP Function and Solubility Are Correlated with Termini Insertion in Flexible, Solvent-Accessible Regions of HaloTag.

Fig. 2C shows that in general, highly functional CPs have their termini introduced in lid loops, lid helices, core loops, and in scattered positions in core helices, while poorly functional CPs have termini introduced in the core β-sheet and some core helices. The region of the lid α-helix H’ that packs together the core and the top of the lid is also intolerant to termini insertion, and it is likely that the introduction of flexible termini in this region impairs TMR binding/ligation and may not directly affect folding.

With these qualitative observations, we set out to look for structural parameters associated with termini locations within the WT protein that have high correlations with CP functional and solubility scores. We find that high levels of function (as determined by either our FACS or gel-assay scores) are moderately correlated with termini insertion in regions with low hydrophobicity, high solvent-accessible surface area (SASA), and high crystal structure B-factors (Fig. 2B and SI Appendix, Fig. S6, Pearson r of 0.21 < r < 0.43). There are also similar correlations (0.19 < r < 0.26) between these parameters and high levels of soluble protein expression. No significant correlation is observed between high function and relative contact order (RCO), which quantifies the average sequence separation between residues in contact in the native state. RCO has previously been shown to have strong correlations with changes in folding rate in small two-state refolding proteins, but not in circular permutants of a singular protein (42, 43). Based on our data with HaloTag CPs, RCO also does not predict high function or high solubility. We also find no significant correlations between high function and solubility of CPs and the N-end rule (−0.01 < r < 0.14), which describes how the N-terminal residue identity of a protein can influence the length of time a protein remains in the cell before degradation (4446). Many of these structural parameters were also incorporated into the CPred machine learning model for predicting functional CP sites within a protein (47, 48). The CPred predictions for CP sites in HaloTag show moderate agreement with our identified functional CP locations (0.43 < r < 0.55).

We were also interested in studying how the observed changes in protein function for each circular permutant could relate to the evolutionary conservation at the site of the new N-terminal residue. We used EVCouplings [v2.evcouplings.org, (49)] to calculate the sequence conservation and mutational tolerance of each residue in HaloTag based on an alignment with 4,850 closely related protein sequences. We theorized that termini insertion may be unfavorable in regions of the protein that are highly conserved or show low mutational tolerance. Regions under high evolutionary selection may be critical for proper structure formation and catalytic function and thus would be incapable of accommodating insertion of flexible, charged termini through circular permutation. We therefore compared our FACS and gel assay scores to several parameters: sequence conservation at each position, the number of evolutionary couplings between residues (coupling scores reflect the likelihood that two residues are in contact), and the mutational tolerance of each position calculated using two different models.

Residues with the highest evolutionary coupling scores are found primarily in the core β-sheet, as do the residues with the lowest mutational tolerance (SI Appendix, Fig. S8). This suggests there are strong evolutionary restraints placed on the HaloTag core, and maintaining residue contacts with the β-sheet could be critical for folding. There is no correlation (−0.14 < r < −0.01) between high levels of CP function and high levels of sequence conservation. There are stronger correlations (0.19 < r < 0.40) between low coupling scores/high mutational tolerance and CP function (Fig. 2C). Although circular permutation is unlikely to occur frequently during evolution, both mutation and circular permutation require structural flexibility to be accommodated. Overall, HaloTag tolerates termini insertion in flexible, surface-exposed regions of the protein where there are few contacts between residue side chains and low levels of evolutionary coupling to other residues in the protein.

Biophysical Analyses.

Our comprehensive analysis has shown that circular permutation can impact the folding and function of a protein in vitro and in vivo. The lack of dependence on individual proteostasis factors suggests that the inherent energy landscape of HaloTag is changing with termini relocation and affecting the overall protein function. To probe the details of how permutation can affect the folding trajectory and energetics of a protein, we selected two circular permutants, CP33 and CP217, for detailed biophysical characterization in vitro. Both of these CPs have high functional scores in our FACS datasets. In our gel assay, CP217 is soluble and functional, while CP33 is functional, but aggregation-prone. CP217 has termini introduced between the core and lid subdomains of HaloTag and therefore contains the intact protein core N-terminal to the protein lid (Fig. 1B, light purple spheres). CP33 places the termini between β2 and β3, therefore moving β1 and β2 to the C-terminus (Fig. 1B, orange spheres). This could slow the formation and collapse of the core β-sheet during both refolding and cotranslational folding.

Circular Permutation Modulates Structure and Stability.

To evaluate the overall structure of the permutants, we turned to circular dichroism (CD) spectroscopy. Fig. 3A shows the far-UV CD spectra of purified HaloTag variants. With the exception of CP217, all show similar CD spectra. CP217, however, displays a slightly altered CD spectrum, suggesting CP217 contains less secondary structure than the other proteins.

Fig. 3.

Fig. 3.

CD spectra and urea denaturation of CP variants. (A) Far-UV CD spectra of HaloTag (black), CP1 (blue), CP33 (orange), and CP217 (purple). Shading represents the SD of each measurement. (B) Urea-induced denaturation monitored by CD at 225 nm and fit with two- or three-state models. HaloTag (black circles) and CP1 (blue diamonds) show one unfolding transition around 4 M urea, CP217 (purple triangles) shows one broad unfolding transition at 5 M urea, and CP33 (orange squares) shows two transitions at 2 M and 5 M urea. CP33 populates an equilibrium intermediate not observed in HaloTag.

Protein stability was evaluated by equilibrium urea-induced denaturation, monitoring the CD signal at 225 nm (Fig. 3B). These resulting denaturation curves were analyzed using a standard two-state (unfolded ⇌ folded) or three-state (unfolded ⇌ intermediate ⇌ folded) linear-extrapolation model (50, 51). With the exception of CP33, all show a single cooperative unfolding transition. The unfolding transitions for HaloTag and CP1 overlap with a resulting Cm of 4 M urea (Table 1). CP217 shows a broader transition with a lower m-value. This lower m-value, along with the change in the CD spectrum, is consistent with a partially unfolded native structure that has a smaller change in surface area upon unfolding. Interestingly, CP33 shows two transitions at 2 M and 5 M urea, uncovering an equilibrium intermediate not observed in the other proteins. After this initial analysis, we evaluated additional lid CPs, uncovering several CPs which show three-state unfolding with an early transition between 0 and 2 M urea and a second transition observed at 5 M urea (SI Appendix, Fig. S9 and Table S1). These CPs were selected based on both our high-throughput results and previous studies using variants of HaloTag (52, 53). In sum, these data show the unfolding of many HaloTag CPs is less cooperative than the unfolding of WT HaloTag, particularly those with termini insertion in the lid region.

Table 1.

CD melt and kinetic refolding fit parameters

HaloTag CP1 CP33 CP217
ΔG1,fold (kcal/mol) −6.46 ± 0.43 −6.21 ± 0.21 −5.71 ± 0.32 −5.21 ± 0.22
m1 (kcal/mol•M) 1.59 ± 0.11 1.58 ± 0.05 2.99 ± 0.30 1.06 ± 0.05
Cm,1 (M) 4.06 ± 0.54 3.93 ± 0.26 1.91 ± 0.02 4.92 ± 0.27
ΔG2,fold (kcal/mol) −3.78 ± 0.60
m2 (kcal/mol•M) 0.76 ± 0.21
Cm,2 (M) 4.97 ± 0.39
t1/2,fast,CD,10C (s) 611 ± 9 1362 ± 22 220 ± 10
t1/2,slow,CD,10C (s) 4548 ± 257 12415 ± 127 1450 ± 72
Aburst,CD,10C (MRE × 10−3) 5.48 ± 0.06 4.50 ± 0.02 6.02 ± 0.05
Afast,CD,10C (MRE × 10−3) 3.34 ± 0.04 1.12 ± 0.01 1.06 ± 0.03
Aslow,CD,10C (MRE × 10−3) 1.90 ± 0.02 1.92 ± 0.01 0.55 ± 0.02

Amide Hydrogen Exchange Monitored by Mass Spectrometry Shows Circular Permutation Can Alter Local Energetics.

To evaluate the local structure and stability of circular permutants, we turned to hydrogen–deuterium exchange monitored by mass spectrometry (HDX/MS) (Fig. 4A). HDX/MS follows the exchange of amide protons with solvent deuterons as a function of time at the resolution of individual peptides. Slowing of exchange, or protection, results from stabilization of backbone hydrogens in hydrogen bonds and/or lack of solvent accessibility (54, 55). Using HDX/MS, we were able to monitor peptides throughout the entire protein (99.7% coverage, Fig. 4B). In the absence of urea, WT HaloTag and CP33 show similar exchange profiles with slowed exchange in the expected regions of secondary structure (Fig. 4 C and D, all peptides), indicating a wild-type structure with a well-folded protein core and lid in CP33. To probe the structure of the equilibrium intermediate observed for CP33, we monitored amide exchange in 3.5 M urea. Most peptides in CP33 were fully exchanged within the first time point, with only the peptides spanning β5-7 and three core helices hB, hC, and hI showing measurable protection from exchange (Fig. 4C peptides 81 to 90 and 114 to 128, summary in Fig. 4E, more peptides in SI Appendix, Fig. S10). These data suggest the CP33 equilibrium intermediate is composed of a minimal folded core, while the lid, β1-4, and core helices hA, hJ, hK, and hL are unfolded.

Fig. 4.

Fig. 4.

The CP33 equilibrium intermediate maps to the protein core, and the CP217 lid is stabilized upon binding of TMR. (A) Schematic outlining continuous-labeling HDX/MS experiments. Protein is diluted into deuterated buffer, and exchange is quenched at each time point. Samples are injected onto an LC/MS setup for inline proteolysis, peptide separation, and analysis via MS. (B) Peptide coverage of WT HaloTag obtained from tandem MS/MS experiments. Peptides covering 99.7% of the HaloTag sequence are obtained, with black lines indicating the subset of peptides analyzed in all experiments. (C) Mass spectra of five example peptides after 10 s of HDX. The dashed blue line marks the monoisotopic mass of the peptide. Bimodal mass spectra are fit with a sum of two Gaussian curves (solid colored line), with the single-Gaussian fits represented in orange and blue (dashed black lines). Less-deuterated blue peaks correspond to folded protein, while the heavier orange peaks correspond to unfolded protein. (D) Example peptides from (C) mapped onto the HaloTag structure. (E) Summary of all peptides structured in the CP33 intermediate (purple). (F) Summary of peptides unfolded in CP217 (orange), and (G) peptides that fold upon binding of TMR in CP217 (purple, highly stabilized. Pink, slightly stabilized). Black residues in (F and G) do not have peptide coverage in CP33 or CP217, respectively.

In contrast to WT HaloTag and CP33, the native state of CP217 only shows protection from exchange in the core, with a highly destabilized or unfolded lid that is fully exchanged even in the absence of urea (Fig. 4C peptide 166 to 183, summary in Fig. 4F, more peptides in SI Appendix, Fig. S10). These data suggest that the helical lid needs to be conformationally restrained at both its N and C termini in order to form a stable structure. Because our gel-based assays indicate that CP217 can fold and function (bind TMR), we carried out the same HDX/MS experiment on CP217 in the presence of the ligand TMR. Under these conditions, we see slowed hydrogen exchange in the lid and stabilization in some core peptides (Fig. 4C peptides 166 to 183 and 263 to 273, summary in Fig. 4G, more peptides in SI Appendix, Fig. S10) consistent with folding upon ligand binding.

In sum, these HDX–MS studies show that circular permutation can alter the local energetics while still maintaining the functional fold.

Circular Permutation Impacts Protein Folding Kinetics.

The in vitro refolding of HaloTag is known to be aggregation-prone when carried out at 0.8 M urea (37 °C), forming visible precipitate (35). CP33 and CP217 also show aggregation in these conditions, though with less apparent insoluble aggregate (SI Appendix, Fig. S11). However, when refolded at 10 °C, all three proteins fold solubly with similar kinetic phases, but different rates (Fig. 5, Table 1, and SI Appendix, Fig. S11). The amplitude of the burst phase is similar in each CP suggesting all might fold through a similar burst phase intermediate (Table 1). In CP217, most of the CD signal is obtained during the burst phase, likely due to the lack of stabilization observed in the protein lid (see below).

Fig. 5.

Fig. 5.

HaloTag CPs show altered refolding kinetics. (A) Refolding to 0.8 M urea monitored by CD at 225 nm for HaloTag (gray), CP33 (orange), and CP217 (purple) at 10 °C. Refolding traces are fitted with biphasic kinetics (black lines), and CD mean residue ellipticity (MRE, (deg•cm2•(dmol•res)−1) × 10−3) of the unfolded protein is marked with a black point and black arrow. (B) Fit residuals for the first 500 s of refolding time.

CP33 Populates a Smaller Refolding Intermediate than HaloTag, Which Resembles Its Equilibrium Intermediate.

The changes in aggregation propensity between the different permutants suggest there may be a change in the folding pathway. To investigate the structural details of the folding trajectories, we turned to pulsed-labeling HDX/MS at 10 °C (Materials and Methods and Fig. 6A) (35, 56, 57). Bimodal mass spectra (heavy and light peaks) were observed for most peptides in all three permutants, where the heavier peak corresponds to the unfolded state and the lighter peak corresponds to a folded state protected from exchange (Fig. 6 B and C). As the protein refolds, the heavier peak decreases in intensity and the lighter peak increases in intensity. These mass spectra were fitted to a sum of two Gaussian curves to calculate the population percentages in the folded/unfolded states at each time point (58).

Fig. 6.

Fig. 6.

CP33 refolds through a different trajectory than HaloTag. (A) Schematic outlining pulsed-labeling HDX/MS experiments. Protein denatured in 7.5 M urea is diluted to 0.8 M urea, and samples are removed at each refolding time point to dilute into deuterated buffer. Hydrogen exchange is quenched after 10 s, and samples are run-through an LC/MS setup for inline proteolysis, peptide separation, and injection into the MS. (B) Mass spectra of three example peptides from HaloTag, CP33, and CP217 after 10 s, 600 s, and 3,600 s of refolding. The dashed blue line marks the monoisotopic mass of the peptide. Bimodal mass spectra are fit with a sum of two Gaussian curves (solid colored line), with the single-Gaussian fits represented in orange and blue (dashed black lines). Less-deuterated blue peaks correspond to folded protein, while the heavier orange peaks correspond to unfolded protein. Peptides are considered fast-folding if the measured populations are >50% folded at 600 s. (C) Example peptides from (B) mapped onto the HaloTag structure. (DF) Summary of all fast-folding peptides in HaloTag, CP33, and CP217 are colored in blue. Gray regions are slow-folding, and (F) orange regions in CP217 are fully exchanged in the folded protein control. (E and F) Black residues in CP33 and CP217 represent regions with no coverage due to termini insertion at a new location in each CP, and spheres indicate the new N-terminal residue in each CP.

We categorized peptides as fast-folding if they are >50% in the folded peak after 600 s (t1/2 of the HaloTag fast-folding phase obtained in CD) of refolding (Fig. 6 B and C). In WT HaloTag (Fig. 6D), we find that the fast-folding intermediate is made up of peptides spanning β1-7 in the core β-sheet and in five helices (hA, hB, hC, hI, and hL) packing onto this core sheet. In CP217, the same set of fast-folding peptides observed in WT HaloTag are also fast-folding in CP217 despite having a predominantly unfolded lid region in the native state (Fig. 6F). Remarkably, the early intermediate in CP33 is notably different than the others (Fig. 6E). The CP33 refolding intermediate is smaller than that observed in HaloTag, with a refolding core composed of only β5-7 and the three core helices packing onto one side of this smaller sheet (hB, hC, and hI). β1-4 and helices hA and hL are now slow-folding. These fast-folding peptides are the same peptides that make up the CP33 equilibrium intermediate, indicating that CP33 folds through this stabilized equilibrium intermediate in vitro. This smaller fast-folding structure is also consistent with the smaller observed amplitude of the fast-folding phase in CD experiments (Table 1).

In addition to the fast-folding core, there are also a set of peptides that show a burst phase or gain of structure within the first 10 s of refolding, defined as a leftward shift in the heavy unfolded peak (indicating protection) when comparing the unfolded control to the first 10-s time point (SI Appendix, Fig. S12 A and B). In all three proteins studied, the burst peptides map to the core β-sheet and helices, suggesting an initial collapse of the core within the deadtime of the experiment. All HaloTag fast-folding peptides show this burst-phase behavior except for peptides 9 to 18 and 286 to 294 (SI Appendix, Fig. S12C, the N-terminal β-strand and C-terminal helix). CP217 and CP33 show highly similar behavior to HaloTag with notable exceptions, suggesting changes in the folding landscape: in CP33, the same set of peptides that have burst-phase behavior in HaloTag also have burst-phase behavior in CP33 (except for peptide 19 to 32 which is located at the CP33 C-terminus) (SI Appendix, Fig. S12 A and B, summary in SI Appendix, Fig. S12D). However, some of these peptides now fold slowly after the burst in CP33 (e.g., Fig. 6B peptide 46 to 66). Therefore, CP33 shows a change in the folding pathway for these regions of the protein after the initial collapse occurs. In CP217, peptide 286 to 294 uniquely shows a burst-phase gain of 0.5 Da that is not observed in HaloTag or in any other CP analyzed with this approach (SI Appendix, Fig. S12 A and B, summary in SI Appendix, Fig. S12E).

As an orthogonal approach to monitor the change in structure of the refolding intermediate in CP33, we turned to pulsed thiol-labeling experiments (35). This experiment monitors folding at the level of burial of the two native cysteines (C61 and C262). Our model predicts that both cysteines should gain protection slowly in CP33, while in WT C61 gains protection during the fast phase. Rates of protection from thiol-labeling can be measured to calculate site-specific rates of refolding (SI Appendix, Fig. S13 AC). WT HaloTag shows biphasic protection from cysteine-labeling, with t1/2s for the fast- and slow-phases on the same orders of magnitude observed in CD experiments, consistent with only one cysteine (C61) gaining protection in the fast-folding phase. CP33, on the other hand, shows slow protection and single-phase kinetics, consistent with both cysteines being located in the slow-folding region of the protein. This slow rate of protection is also consistent with the slow-phase folding observed by CD (SI Appendix, Fig. S13 DF and Table S2). Our findings that CP33 contains a change in folding trajectory are therefore supported by experiments probing both secondary (via HDX) and tertiary (via thiol labeling) structure.

Discussion

We have taken a comprehensive approach to investigate how repositioning the N-terminus impacts a protein’s energy landscape by evaluating every possible circular permutant of the protein HaloTag. Relocation of protein termini to a new position through circular permutation has the potential to affect protein structure and function both in vivo and in vitro. For instance, placement of the N-terminus is expected to bias folding pathways and could either favor or disfavor aggregation-prone intermediates, which could lead to novel aggregation or engagement of proteostasis factors not essential for the wild-type protein. To evaluate these features, we examined the folding and function of all 297 circular permutants using both an in vitro gel assay and an in-cell high-throughput FACS screen. With our gel-based assay, we find that HaloTag is highly soluble when overexpressed in bacterial cells, yet some functional CPs show a novel propensity to aggregate, which is not observed in the wild-type protein. Termini insertion within the protein core is most likely to yield aggregation-prone proteins, while the lid is amenable to termini insertion while retaining both high solubility and functional levels. In general, HaloTag is very tolerant of circular permutation, with 60% of CPs retaining high levels of function and another 18% retaining low levels of TMR binding.

A combination of FACS and next-generation sequencing allowed us to screen every possible HaloTag CP in four strains of E. coli: the BL21 expression strain, the Keio collection parent strain BW25113, and the Keio Δtig and Δlon strains knocking out the cotranslational chaperone trigger factor (TF) and the intracellular protease Lon (41). Since circular permutation inherently changes the in vivo order of synthesis of secondary structural elements, some CPs generate highly hydrophobic stretches of residues at their N termini as the protein core is synthesized. These hydrophobic regions might be selectively recognized by, and interact with, TF during synthesis, especially if the collapse of the hydrophobic core is delayed cotranslationally. Additionally, TF contains a peptidyl-prolyl cis/trans isomerase domain (59) and could be required for the in vivo folding of HaloTag as it contains three cis-proline peptide bonds in the wild-type structure and proline makes up 10% of its sequence composition. Therefore, we expected to identify changes in functional protein folding in the Δtig cellular environment. We also investigated the effect of Lon protease on CP function, as Lon is a highly conserved intracellular protease responsible for clearing misfolded proteins that are deleted in BL21 cells (60). We were interested to see whether any of the functional, aggregation-prone proteins identified in our gel-based assays would show changes in function in the presence or absence of this protease in BW25113 cells.

Contrary to our expectations, we found that the effect of circular permutation on HaloTag is robust to changes in the cellular proteostasis machinery. There are no significant changes in function for CPs when comparing the Δlon and parent BW25113 datasets, and only one CP (CP290) becomes significantly nonfunctional in the absence of TF. After comparing various structural parameters to the gel-assay and FACS scores, we find that termini insertion in buried, restricted locations is most disruptive to HaloTag function and solubility when expressed in cells independent of the strain studied. While there are good correlations between these parameters and our CP scores, there still does not seem to be a perfect predictor of high levels of CP function.

What structural features are successful for starting a protein sequence? The CPred server applies several machine learning techniques to calculate the probability of successful termini insertion at each position in a protein (47, 48). Indeed, we see a strong correlation between our data and predictions of viable CP locations from CPred (Pearson r 0.43 < r < 0.55), though the correlation between CP solubility and CPred predictions is slightly worse (Pearson r = 0.34). Termini relocation to regions of the protein with low evolutionary coupling scores and high mutational tolerance is also more likely to yield a functional CP. It is interesting that termini insertion in these highly coupled regions of the protein is deleterious to protein function and solubility. This could be due to the disruption of conserved stretches of contacts with the introduction of flexible, charged termini. These strong contacts could also be critical to proper formation of folding intermediates, and breaking these contacts through termini insertion could disrupt intermediates populated cotranslationally.

Our results, therefore, suggest that a) the successful folding of HaloTag is highly influenced by termini location and that b) the misfolding and/or aggregation of CPs is impacted more by changes in folding and introduction of termini in conformationally restrained regions than by changes in interactions with proteostasis factors. Functional CPs are most likely to have termini introduced in conformationally flexible, surface-exposed regions with few to no side-chain contacts with other residues. Additionally, one of the strongest predictors of aggregation propensity in CPs is termini insertion within the fast-refolding intermediate we identified by HDX/MS (Pearson r = 0.37). Termini insertion in this region likely disrupts the collapse and stabilization of the refolding core, which would increase the exposure of hydrophobic residues and the likelihood of forming intermolecular contacts leading to aggregate formation.

Interestingly, relocating the termini in HaloTag differentially impacts the energy landscape of the native state. The CPs we examined biophysically show differences in stability between the lid and core subdomains compared to the wild-type protein, where the folding of these regions is highly coupled. In fact, several CPs show three-state equilibrium behavior, in contrast to the two-state nature of WT HaloTag, and the lid region of the protein is also highly destabilized in many CPs despite retaining high levels of function. This was initially surprising given that the lid makes up most of the ligand-interacting surface during binding, but given the aliphatic nature of HaloTag ligands, the lid likely collapses onto the long carbon chain in Halo-TMR.

In addition to the structure and stability of the folded state, the folding process itself is affected by circular permutation. Not surprisingly, the kinetics of folding depend on the specific permutation. What is perhaps more surprising, however, is that the folding pathway is altered, and here we identify a change in the structure of a refolding intermediate with circular permutation. The fast-refolding intermediate of CP33 comprises less of the core region than wild-type HaloTag. CP33 and CP217 also show less aggregation propensity than HaloTag, though the mechanism of avoiding aggregation is unclear. Interestingly, CP33 is aggregation-prone when expressed in cells, suggesting there could also be a change in the cotranslational folding pathway for this CP as compared to WT HaloTag. These changes in protein energetics, folding rates, and folding pathways are consistent with previous observations in circular permutant studies. For example, in a randomized CP study of DsbA, an equilibrium intermediate is observed in chemical denaturation experiments with an oxidized CP that is not observed in wild-type DsbA, similar to what we see in CP33 and some of the lid CPs of HaloTag (61). In CP studies of the ribosomal protein S6 from Thermus thermophilus, CPs show changes in protein stability and folding kinetics compared to wild-type protein (62). Finally, phi-value analyses on CPs of the small ribosomal protein S6 also present evidence for folding through parallel pathways. α-spectrin SH3 CPs show changes in their folding transition states, indicating that refolding through different trajectories may be a general feature of circular permutation (9, 62).

This work adds to the small database of comprehensive studies on the effects of circular permutation within a protein. In a systematic study of E. coli dihydrofolate reductase (DHFR) circular permutants (63), 10 contiguous stretches of CPs were identified that are incapable of folding, suggesting that these regions are critical “folding elements” in DHFR. These regions include 10 of the 13 residues known to gain early (6 ms) protection during folding as determined by pulsed-labeling HDX-NMR (64). Similarly, in a randomized circular permutation study examining 65 CP sites out of 189 possible in DsbA (61), functional CPs did not have termini introduced in certain helices and β-strands, and rationally designed CPs in these regions were inactive. Our HaloTag FACS-seq data also identify contiguous stretches of nonfunctional CPs, many of which are found within the fast-folding HaloTag intermediate, showcasing how termini insertion within fast-folding structural elements is highly disruptive to proper folding and function.

Our systematic demonstration that termini relocation in HaloTag can drastically impact soluble, functional protein expression in a manner independent of changes in the cellular environment has important implications for both de novo protein design and for engineering proteins for improved expression. They may also shed light on the changes in the folding of proteins proteolytically processed in the cell, resulting in a new N-terminus. We cannot speculate on how the trends we observe between successful termini relocation and structural parameters would extend to larger, more-complex protein systems. It would be interesting to perform similar CP studies and test our observations in proteins that can undergo changes in oligomeric state, regulation by a binding partner, or that are known chaperone substrates. These data could then be used to train additional CP prediction models or to help improve existing models, which to date have primarily been developed using a small number of systematic and randomized CP datasets (47, 48). Additionally, performing FACS-seq in cellular environments lacking other cotranslational chaperones such as the DnaK/DnaJ/DnaE system or posttranslational chaperones such as the Hsp90s, or growing cells at lower temperatures to decrease rates of translation in the cell, could provide further insight into whether these changes in the environment are sufficient to modulate the successful folding of HaloTag CPs. We also recognize that all of our in-cell functional results come from experiments in which we are inducing overexpression of CPs with an IPTG-inducible promoter, and this could mask any subtle changes in protein function. Given that multiple chaperones often interact with the same protein clients, it is also possible that no effect is seen on CP function in the Δtig results presented here due to compensating interactions with DnaK and other chaperones. It would therefore also be interesting to perform these experiments under a more tightly regulated titratable promoter, or to examine changes in function between overexpression and leaky expression conditions, to see whether reducing the amount of CP being made in each cell then impacts functional protein folding.

Materials and Methods

A Gel-Based Assay for Determining CP Function and Solubility.

All 297 CP sequences were subcloned from a tandem HaloTag plasmid containing the HaloTag–GTGSGSGS–HaloTag cDNA sequence (SI Appendix, Methods). 5 mL cultures were seeded from saturated overnight cultures of E. coli BL21 (DE3) cells expressing each CP. Cells were grown to an OD600 of ~0.6, induced with 1 mM IPTG, and lysed using either the detergent-based BugBuster Protein Extraction Reagent (Millipore) or by performing three freeze-thaw cycles in lysis buffer. Samples of whole cell lysate (L) and the clarified supernatant (S) were taken and stained with 2.5 μM of the HaloTag ligand TMR, diluted to 2.5 μM in 25 mM HEPES pH 7.5, 15 mM Mg(OAc)2, 150 mM KCl, and 0.1 mM TCEP (HKMT buffer). Samples were boiled for 5 min at 95 °C in an SDS loading buffer, then separated via SDS-PAGE on a NuPAGE 4 to 12% Bis-Tris 1.5 mm gel. Gels were imaged for TMR fluorescence, Coomassie stained, and destained, and then imaged for Coomassie staining. Functional scores were then calculated for each CP based on the ratio of in-gel TMR fluorescence to the amount of protein in the S fraction. Solubility scores were calculated by comparing the amount of protein in the S fraction vs. the L fraction (SI Appendix, Methods).

Screening the CP Library via FACS-seq in BL21, BW25113, Δtig, and Δlon Cells.

The CPLib gene pool was inserted into either the pNRD_BL21 expression vector or the modified pNRD_ASKA expression vector (SI Appendix, Methods). All Keio collection strains were obtained from the Deutschbauer Lab at UC Berkeley. Homemade electrocompetent BW25113, Δtig, and Δlon strains were prepared before transformation with the CPLib_ASKA vector pool (SI Appendix, Methods). Commercial electrocompetent BL21 (DE3) cells (Sigma-Aldrich) were transformed with the CPLib_BL21 library following standard protocols and grown overnight at 37 °C. Transformation efficiencies were calculated using a standard titer protocol. Fresh 5 mL cultures were inoculated the following morning from the saturated overnights if the transformation efficiency was greater than 106 cfu (>3,000× the library size). Cells were grown to an OD600 ~ 0.6 before inducing protein expression with 1 mM IPTG and continuing shaking at 37 °C for 1 h. 200 μL of cells were removed and stained with 5 μM TMR for 1 h at room temperature. Cells were then washed three times with 1 mL PBS buffer (137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4, and 1.8 mM KH2PO4 pH 7.4) and diluted to 2 mL total volume of cells before sorting on a Sony SH800 (SI Appendix, Methods). Sorted cell populations were miniprepped (Takara NucleoSpin Plasmid miniprep kit), and DNA was prepared for Illumina MiSeq next-generation sequencing via two consecutive PCRs (SI Appendix, Methods). Sequencing data were analyzed using in-house Python scripts. The DESeq2 RNA-sequencing analysis package was then used to determine enrichment scores for each CP, to measure error between replicates, and to assign P-values to each score (39). Computational analyses were then also done to calculate correlations between FACS enrichment scores and different structural parameters (SI Appendix, Methods).

Protein Expression and Purification.

Wild-type HaloTag and CPs were expressed in BL21 (DE3) cells in 1 L cultures. Cells were grown to an OD600 ~ 0.6, then expression was induced with 1 mM IPTG. Cells continued shaking at 37 °C for 3 h before pelleting cells and resuspending in 20 mL of 50 mM Tris-HCl pH 7.5. Pellet resuspensions were stored at −80 °C. Thawed pellets were sonicated on ice to lyse cells, and clarified supernatant was filtered and loaded onto a 5 mL HiTrap Q HP column (Cytiva 17115401). Protein was eluted off the column with a high salt gradient from 50 mM Tris-HCl pH 7.5 to 50 mM Tris-HCl pH 7.5, 1 M NaCl. Fractions containing HaloTag were pooled and run over a HiLoad 16/600 Superdex 75 size-exclusion chromatography column (GE), buffer-exchanging into HKMT buffer. Purity of all proteins was assessed via SDS-PAGE. Protein was immediately used for experiments, or stored short-term at 4 °C.

Circular Dichroism Spectroscopy: Equilibrium Denaturation and Protein Refolding.

CD experiments were done on either an Aviv 410 or an Aviv 430 CD spectrometer. Protein spectra were obtained with 0.5 mg/mL protein equilibrated overnight at 37 °C in HKMT buffer, measuring CD signal every 0.25 nm from 210 nm to 300 nm. (SI Appendix, Methods). Spectra were buffer-corrected and then converted to mean residue ellipticity using the following equation (65):

MRE x 10-3 = ((CD signal)/(10 * pathlength in cm * nres * concentration in M)) x 1E-3.

Chemical denaturation melts were set up by making two stocks of 0.1 mg/mL protein in HKMT and in HKMT with 8 M urea, then mixing the protein stocks at different ratios to make each melt sample and equilibrating overnight at 37 °C. CD signal was measured at 225 nm for each sample (SI Appendix, Methods). Melts were fit using Python scripts with the following global fit equations for two-state and three-state transitions, where F is the folded baseline, I is the intermediate baseline, and U is the unfolded baseline:

Two-state melt (50):

CDsignal=Fintercept+Fslope[urea]+(Uintercept+Uslope[urea])e-(ΔG-m[urea])/RT1+e-(ΔG-m[urea])/RT.

Three-state melt (51):

CDsignal=(Fintercept+Fslope[urea])e-m1([urea]-Cm1)/RT+Iintercept+(Uintercept+Uslope[urea])em2([urea]-Cm2)/RT1+e-m1([urea]-Cm1)/RT+em2([urea]-Cm2)/RT.

Protein stocks for refolding experiments were prepared at 2 mg/mL in HKMT with 7.5 M urea and equilibrated overnight at either 37 °C or 10 °C (SI Appendix, Methods). Protein was diluted 1:10 into refolding buffer to initiate refolding at 0.2 mg/mL, and the CD signal was measured at 225 nm every second during the experiment. UV-Vis absorbance spectra were taken of the recovered samples from 240 nm to 500 nm with an Agilent Cary UV-Vis Compact Peltier. If light absorbance was observed between 300 nm and 400 nm, indicative of protein aggregate formation, the sample was centrifuged at 16k rpm for 10 min. Spectra were then taken to determine what concentration of protein remained soluble after refolding. CD kinetic refolding traces were buffer-corrected and, if no aggregation was observed, traces were fit with single-exponential or double-exponential kinetics as done previously to measure rates of refolding (35):

Single-exponential kinetics: CD signal y=y0+A1e-k1t

Double-exponential kinetics: CD signal y=y0+A1e-k1t+A2e-k2t

Continuous-Labeling and Pulsed-Labeling Hydrogen–Deuterium Exchange Coupled with Mass Spectrometry.

All HDX experiments were done with deuterated HKMT buffers containing zero or various concentrations of urea. Buffers were lyophilized and resuspended in D2O (Sigma-Aldrich 151882). To prepare proteins for continuous-labeling HDX, 10 μM HaloTag (HKMT, 0 M urea), and 10 μM CP33 (HKMT, 0 M or 3.5 M urea) were equilibrated overnight at 37 °C to mimic conditions used in CD melt experiments. For CP217 experiments with and without TMR, 10 μM CP217 was equilibrated with 50 μM TMR at 37 °C for 30 min before equilibrating overnight at 10 °C to mimic pulsed-labeling HDX/MS conditions. To initiate exchange, samples were diluted 1:10 into deuterated buffer. At each time point, the exchange mixture was diluted 1:1 into an ice-cold 2× quench buffer (3.5 M GdmCl, 1.5 M glycine, 0.5 M TCEP pH 2.4), flash-frozen in liquid nitrogen, and stored at −80 °C until LC-MS injection. In pulsed-labeling HDX experiments, 50 μM protein (HKMT, 7.5 M urea) was equilibrated overnight at 10 °C. Unfolded protein was diluted 1:10 into a temperature-equilibrated buffer to initiate refolding. At each folding time point, the refolding protein solution was diluted 1:10 into deuterated HKMT, quenched after 10 s of exchange by diluting 1:1 into ice-cold 2× quench buffer, flash frozen in liquid nitrogen, and stored at −80 °C until LC-MS injection. Digestion and LC-MS were performed as previously described, and HDX data were analyzed using a combination of HDExaminer 3 (Sierra Analytics) and in-house Python scripts (SI Appendix, Methods) (58).

Supplementary Material

Appendix 01 (PDF)

Dataset S01 (CSV)

pnas.2321999121.sd01.csv (74.3KB, csv)

Acknowledgments

We thank the Marqusee Lab for experimental advice and support, particularly HT Hobbs for advice with library generation and NGS and SM Costello and SR Shoemaker for assistance with HDX/MS protocols. We thank the Kuriyan Lab for assistance with FACS and NGS, VV Trotter (Deutschbauer Lab) for the Keio collection strains, and A Hung and KB Sander (Arkin Lab) for the ASKA expression vector and assistance with FACS. We thank F Ramirez and C Rose from QB3 Genomics for assistance with NGS. We thank W Coyote-Maestas for assistance with structural parameter analyses. This work was supported by funding from the NIH (S.M.), the Chan Zuckerberg Biohub (S.M.), and the NSF Graduate Research Program Fellowship DGE 1752814 (N.R.D.). S.M. is a Chan Zuckerberg Biohub Investigator.

Author contributions

S.M. and N.R.D. designed research; N.R.D., C.A.T.F.M., and H.L.T.V. performed research; S.M. and N.R.D. analyzed data; and S.M. and N.R.D. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

This article is a PNAS Direct Submission.

Data, Materials, and Software Availability

All study data are included in the article and/or supporting information.

Supporting Information

References

  • 1.Young R., Bremer H., Polypeptide-chain-elongation rate in Escherichia coli B/r as a function of growth rate. Biochem. J. 160, 185–94 (1976). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kramer G., Shiber A., Bukau B., Mechanisms of cotranslational maturation of newly synthesized proteins. Annu. Rev. Biochem. 88, 337–364 (2018). [DOI] [PubMed] [Google Scholar]
  • 3.Cabrita L. D., Dobson C. M., Christodoulou J., Protein folding on the ribosome. Curr. Opin. Struct. Biol. 20, 33–45 (2010). [DOI] [PubMed] [Google Scholar]
  • 4.Kaiser C. M., Liu K., Folding up and moving on—Nascent protein folding on the ribosome. J. Mol. Biol. 430, 4580–4591 (2018), 10.1016/j.jmb.2018.06.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.De Sancho D., Best R. B., What is the time scale for α-helix nucleation? J. Am. Chem. Soc. 133, 6809–6816 (2011). [DOI] [PubMed] [Google Scholar]
  • 6.Bhushan S., et al. , α-Helical nascent polypeptide chains visualized within distinct regions of the ribosomal exit tunnel. Nat. Struct. Mol. Biol. 17, 313–717 (2010). [DOI] [PubMed] [Google Scholar]
  • 7.Holtkamp W., et al. , Cotranslational protein folding on the ribosome monitored in real time. Science 350, 1104–1107 (2015). [DOI] [PubMed] [Google Scholar]
  • 8.Nilsson O. B., et al. , Cotranslational protein folding inside the ribosome exit tunnel. Cell Rep. 12, 1533–1540 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Nilsson O. B., et al. , Cotranslational folding of spectrin domains via partially structured states. Nat. Struct. Mol. Biol. 24, 221–225 (2017). [DOI] [PubMed] [Google Scholar]
  • 10.Kudva R., et al. , The shape of the bacterial ribosome exit tunnel affects cotranslational protein folding. Elife 7, e36326 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.O’Brien E. P., Christodoulou J., Vendruscolo M., Dobson C. M., New scenarios of protein folding can occur on the ribosome. J. Am. Chem. Soc. 133, 513–526 (2011). [DOI] [PubMed] [Google Scholar]
  • 12.Clark P. L., Protein folding in the cell: Reshaping the folding funnel. Trends Biochem. Sci. 29, 527–534 (2004). [DOI] [PubMed] [Google Scholar]
  • 13.Waudby C. A., Dobson C. M., Christodoulou J., Nature and regulation of protein folding on the ribosome. Trends Biochem. Sci. 44, 914–926 (2019), 10.1016/j.tibs.2019.06.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Samatova E., Komar A. A., Rodnina M. V., How the ribosome shapes cotranslational protein folding. Curr. Opin. Struct. Biol. 84, 102740 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.To P., Whitehead B., Tarbox H. E., Fried S. D., Nonrefoldability is pervasive across the E. coli proteome. J. Am. Chem. Soc. 143, 11435–11448 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ugrinov K. G., Clark P. L., Cotranslational folding increases GFP folding yield. Biophys. J. 98, 1312–1320 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Nimmesgern E., Hartl F. U., ATP-dependent protein refolding activity in reticulocyte lysate. Evidence for the participation of different chaperone components. FEBS Lett. 331, 25–30 (1993). [DOI] [PubMed] [Google Scholar]
  • 18.Svetlov M. S., Effective cotranslational folding of firefly luciferase without chaperones of the Hsp70 family. Protein Sci. 15, 242–247 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Balchin D., Hayer-Hartl M., Hartl F. U., In vivo aspects of protein folding and quality control. Science 353, aac4354 (2016). [DOI] [PubMed] [Google Scholar]
  • 20.Szabo A., et al. , The ATP hydrolysis-dependent reaction cycle of the Escherichia coli Hsp70 system DnaK, DnaJ, and GrpE. Proc. Natl. Acad. Sci. U.S.A. 91, 10345–10349 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kramer G., Boehringer D., Ban N., Bukau B., The ribosome as a platform for co-translational processing, folding and targeting of newly synthesized proteins. Nat. Struct. Mol. Biol. 16, 589–597 (2009). [DOI] [PubMed] [Google Scholar]
  • 22.Becker A. H., Oh E., Weissman J. S., Kramer G., Bukau B., Selective ribosome profiling as a tool for studying the interaction of chaperones and targeting factors with nascent polypeptide chains and ribosomes. Nat. Protocols 8, 2212–2239 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Döring K., et al. , Profiling Ssb-Nascent chain interactions reveals principles of Hsp70-assisted folding. Cell 170, 298–311.e20 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhou M., et al. , Non-optimal codon usage affects expression, structure and function of clock protein FRQ. Nature 495, 111–115 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Siller E., DeZwaan D. C., Anderson J. F., Freeman B. C., Barral J. M., Slowing bacterial translation speed enhances eukaryotic protein folding efficiency. J. Mol. Biol. 396, 1310–1318 (2010). [DOI] [PubMed] [Google Scholar]
  • 26.Walsh I. M., Bowman M. A., Soto Santarriaga I. F., Rodriguez A., Clark P. L., Synonymous codon substitutions perturb cotranslational protein folding in vivo and impair cell fitness. Proc. Natl. Acad. Sci. U.S.A. 117, 3528–3534 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Goldenberg D. P., Creighton T. E., Circular and circularly permuted forms of bovine pancreatic trypsin inhibitor. J. Mol. Biol. 165, 407–413 (1983). [DOI] [PubMed] [Google Scholar]
  • 28.Luger K., Hommel U., Herold M., Hofsteenge J., Kirschner K., Correct folding of circularly permuted variants of a beta alpha barrel enzyme in vivo. Science 243, 206–210 (1989). [DOI] [PubMed] [Google Scholar]
  • 29.Zhang T., Bertelsen E., Benvegnu D., Alber T., Circular permutation of T4 lysozyme. Biochemistry 32, 12311–12318 (1993). [DOI] [PubMed] [Google Scholar]
  • 30.Qian Z., Horton J. R., Cheng X., Lutz S., Structural redesign of lipase B from Candida antarctica by circular permutation and incremental truncation. J. Mol. Biol. 393, 191–201 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Yu Y., Lutz S., Circular permutation: A different way to engineer enzyme structure and function. Trends Biotechnol. 29, 18–25 (2011). [DOI] [PubMed] [Google Scholar]
  • 32.Viguera A. R., Blanco F. J., Serrano L., The order of secondary structure elements does not determine the structure of a protein but does affect its folding kinetics. J. Mol. Biol. 247, 670–681 (1995). [DOI] [PubMed] [Google Scholar]
  • 33.Whitehead T. A., Bergeron L. M., Clark D. S., Tying up the loose ends: Circular permutation decreases the proteolytic susceptibility of recombinant proteins. Protein Eng. Des. Sel. 22, 607–613 (2009). [DOI] [PubMed] [Google Scholar]
  • 34.Marsden A. P., et al. , Investigating the effect of chain connectivity on the folding of a beta-sheet protein on and off the ribosome. J. Mol. Biol. 430, 5207–5216 (2018), 10.1016/j.jmb.2018.10.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Samelson A. J., et al. , Kinetic and structural comparison of a protein’s cotranslational folding and refolding pathways. Sci. Adv. 4, eaas9098 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Los G. V., et al. , HaloTag: A novel protein labeling technology for cell imaging and protein analysis. ACS Chem. Biol. 3, 373–382 (2008). [DOI] [PubMed] [Google Scholar]
  • 37.David L., et al. , The α/β hydrolase fold. Protein Eng. Des. Sel. 5, 197–211 (1992). [Google Scholar]
  • 38.Ollis D. L., et al. , The alpha/beta hydrolase fold. Protein Eng. 5, 197–211 (1992). [DOI] [PubMed] [Google Scholar]
  • 39.Love M. I., Huber W., Anders S., Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Armstrong R. A., When to use the Bonferroni correction. Ophthalmic Physiol. Opt. 34, 502–508 (2014). [DOI] [PubMed] [Google Scholar]
  • 41.Baba T., et al. , Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: The Keio collection. Mol. Syst. Biol. 2, 2006.0008 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Plaxco K. W., Simons K. T., Baker D., Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277, 985–994 (1998). [DOI] [PubMed] [Google Scholar]
  • 43.Miller E. J., Fischer K. F., Marqusee S., Experimental evaluation of topological parameters determining protein-folding rates. Proc. Natl. Acad. Sci. U.S.A. 99, 10359–10363 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Gonda D. K., et al. , Universality and structure of the N-end rule. J. Biol. Chem. 264, 16700–16712 (1989). [PubMed] [Google Scholar]
  • 45.Tobias J. W., Shrader T. E., Rocap G., Varshavsky A., The N-end rule in bacteria. Science 254, 1374–1377 (1991). [DOI] [PubMed] [Google Scholar]
  • 46.Varshavsky A., The N-end rule: Functions, mysteries, uses. Proc. Natl. Acad. Sci. U.S.A. 93, 12142–12149 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Lo W.-C., et al. , Deciphering the preference and predicting the viability of circular permutations in proteins. PLoS One 7, e31791 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Lo W.-C., et al. , CPred: A web server for predicting viable circular permutations in proteins. Nucleic Acids Res. 40, W232–W237 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Marks D. S., Hopf T. A., Sander C., Protein structure prediction from sequence variation. Nat. Biotechnol. 30, 1072–1080 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Street T. O., Courtemanche N., Barrick D., “Protein folding and stability using denaturants” in Methods in Cell Biology, Correia J. J., Detrich H. W. III, Eds. (Elsevier, 2008), pp. 295–325. [DOI] [PubMed] [Google Scholar]
  • 51.Barrick D., Baldwin R. L., Three-state analysis of sperm whale apomyoglobin folding. Biochemistry 32, 3790–3796 (1993). [DOI] [PubMed] [Google Scholar]
  • 52.Ishikawa H., Meng F., Kondo N., Iwamoto A., Matsuda Z., Generation of a dual-functional split-reporter protein for monitoring membrane fusion using self-associating split GFP. Protein Eng. Des. Sel. 25, 813–820 (2012). [DOI] [PubMed] [Google Scholar]
  • 53.Deo C., et al. , The HaloTag as a general scaffold for far-red tunable chemigenetic indicators. Nat. Chem. Biol. 17, 718–723 (2021). [DOI] [PubMed] [Google Scholar]
  • 54.Englander S. W., Hydrogen exchange and mass spectrometry: A historical perspective. J. Am. Soc. Mass Spectrom. 17, 1481–1489 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Zheng J., Strutzenberg T., Pascal B. D., Griffin P. R., Protein dynamics and conformational changes explored by hydrogen/deuterium exchange mass spectrometry. Curr. Opin. Struct. Biol. 58, 305–313 (2019). [DOI] [PubMed] [Google Scholar]
  • 56.Hu W., et al. , Stepwise protein folding at near amino acid resolution by hydrogen exchange and mass spectrometry. Proc. Natl. Acad. Sci. U.S.A. 110, 7684–7689 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Walters B. T., Mayne L., Hinshaw J. R., Sosnick T. R., Englander S. W., Folding of a large protein at high structural resolution. Proc. Natl. Acad. Sci. U.S.A. 110, 18898–18903 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Costello S. M., et al. , The SARS-CoV-2 spike reversibly samples an open-trimer conformation exposing novel epitopes. Nat. Struct. Mol. Biol. 29, 229–238 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kawagoe S., Nakagawa H., Kumeta H., Ishimori K., Saio T., Structural insight into proline cis/trans isomerization of unfolded proteins catalyzed by the trigger factor chaperone. J. Biol. Chem. 293, 15095–15106 (2018), 10.1074/jbc.RA118.003579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Thompson S., Zhang Y., Ingle C., Reynolds K. A., Kortemme T., Altered expression of a quality control protease in E. coli reshapes the in vivo mutational landscape of a model enzyme. Elife 9, e53476 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Hennecke J., Sebbel P., Glockshuber R., Random circular permutation of DsbA reveals segments that are essential for protein folding and stability. J. Mol. Biol. 286, 1197–1215 (1999). [DOI] [PubMed] [Google Scholar]
  • 62.Haglund E., Lindberg M. O., Oliveberg M., Changes of protein folding pathways by circular permutation: Overlapping nuclei promote global cooperativity. J. Biol. Chem. 283, 27904–27915 (2008). [DOI] [PubMed] [Google Scholar]
  • 63.Iwakura M., Nakamura T., Yamane C., Maki K., Systematic circular permutation of an entire protein reveals essential folding elements. Nat. Struct. Biol. 7, 580–585 (2000). [DOI] [PubMed] [Google Scholar]
  • 64.Jones B. E., Robert Matthews C., Early intermediates in the folding of dihydrofolate reductase from escherichia coli detected by hydrogen exchange and NMR. Protein Sci. 4, 167–177 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Greenfield N. J., Using circular dichroism spectra to estimate protein secondary structure. Nat. Protoc. 1, 2876–2890 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

Dataset S01 (CSV)

pnas.2321999121.sd01.csv (74.3KB, csv)

Data Availability Statement

All study data are included in the article and/or supporting information.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES