Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Oct 1.
Published in final edited form as: Biopolymers. 2021 Sep 9;112(10):e23471. doi: 10.1002/bip.23471

A sequence-based method for predicting extant fold switchers that undergo α-helix <-> β-strand transitions

Soumya Mishra 1,3, Loren L Looger 3, Lauren L Porter 1,2,*
PMCID: PMC8545793  NIHMSID: NIHMS1733490  PMID: 34498740

Abstract

Extant fold-switching proteins remodel their secondary structures and change their functions in response to cellular stimuli, regulating biological processes and affecting human health. Despite their biological importance, these proteins remain understudied. Predictive methods are needed to expedite the process of discovering and characterizing more of these shapeshifting proteins. Most previous approaches require a solved structure or all-atom simulations, greatly constraining their use. Here, we propose a high-throughput sequence-based method for predicting extant fold switchers that transition from α-helix in one conformation to β-strand in the other. This method leverages two previous observations: (1) α-helix <-> β-strand prediction discrepancies from JPred4 are a robust predictor of fold switching, and (2) the fold-switching regions (FSRs) of some extant fold switchers have different secondary structure propensities when expressed by themselves (isolated FSRs) than when expressed within the context of their parent protein (contextualized FSRs). Combining these two observations, we ran JPred4 on 99 fold-switching proteins and found strong correspondence between predicted and experimentally observed α-helix <-> β-strand discrepancies. To test the overall robustness of this finding, we randomly selected regions of proteins not expected to switch folds (single-fold proteins) and found significantly fewer predicted α-helix <-> β-strand discrepancies. Combining these discrepancies with the overall percentage of predicted secondary structure, we developed a classifier to identify extant fold switchers (Matthews Correlation Coefficient of 0.71). Although this classifier had a high false negative rate (7/17), its false positive rate was very low (2/136), suggesting that it can be used to predict a subset of extant fold switchers from a multitude of available genomic sequences.

1. Introduction

Extant fold-switching proteins remodel their secondary structures and change their functions in response to cellular stimuli1. These environmentally responsive shapeshifters perform over 30 diverse functions, occur in all domains of life, and are associated with diseases such as cancer2, autoimmune disorders3, and malaria4. Furthermore, increasing evidence suggests that extant fold switchers regulate biological processes5 such as cyanobacterial circadian rhythms6 and transcription/translation of bacterial virulence genes7.

Compared with single-fold proteins, which maintain stable secondary and tertiary structures and typically perform one biological function, extant fold switchers are understudied. Specifically, out of the ~160,000 proteins with solved structures available in the Protein Data Bank (PDB8), fewer than 100 have been shown to switch folds. Increasing evidence suggests that fold switching is likely more widespread than currently appreciated1, but the current shortage of experimental examples makes it difficult to determine either the physical chemical properties or the functional scope of fold switchers. Thus, predictive tools are needed to identify more.

Recent computational studies suggest that fold switching is predictable, a prospect that—if realized—could greatly expand the small pool of experimentally determined fold switchers currently available. For example, naturally occurring extant fold switchers were predicted blindly by searching for differences between predicted and experimentally determined protein structures 1, 9. Furthermore, several fold-switching proteins have been designed computationally using the Rosetta software suite10, 11. Progress has also been made in predicting mutation-induced fold switching12, 13 as well as other conformational changes, such as rigid body motions14. Finally, a classifier for extant fold switchers was recently developed as a proof of concept that fold switching is predictable from protein sequence15. This classifier is based on confidences of all secondary structure predictions (helix, strand and coil), whereas the one we developed relies on discrepancies between predicted α-helices and β-strands.

Here we present a sequence-based method for predicting extant fold switchers. This method builds on our previous approach designed for evolved fold switchers, which are defined to have highly similar sequences but different folds12. By contrast, extant fold switchers have one sequence that can assume more than one stable secondary and tertiary structure configuration. Whereas the approach for extant fold switchers compared secondary structure predictions of two (or more) different proteins with slightly different sequences, the current method identifies extant fold switchers from the secondary structure predictions of different regions from a single amino acid sequence. The following hypothesis provides the basis for our method: the JPred4 secondary structure prediction of an isolated fold-switching region (FSR) sequence might differ from the JPred4 prediction of the same FSR within the context of its naturally occurring sequence (hereafter called a contextualized FSR). We developed this hypothesis using the previous observation1 that extant fold-switching proteins generally have: (1) regions that change secondary structure between the two forms (FSRs) and (2) regions that maintain the same secondary structure (structurally constant regions, or SCRs16). By definition, FSRs assume multiple stable secondary structures, and several studies have suggested that at least one FSR conformation is stabilized by exogenous interactions17, 18. Together, these observations indicate that the dominant secondary structure of a given FSR might differ depending on the context of its sequence. Thus, we tested our approach on 99 extant fold switchers with the aim of developing a classifier that could distinguish extant fold switchers from single-folding proteins.

2. Methods

2.1. Selection of extant fold switchers

We selected 93 extant fold switchers from a previous dataset1. We excluded 2GED/1NRJB and 3VO9B/3VPAA because they had nearly identical structures but were misclassified due to missing crystal density. We also excluded 1MBYA/2N19A because they come from different organisms, their FSRs differ by 3 amino acids, and their resting states appear to assume different conformations. Thus, they appear to be evolved–rather than extant–fold switchers. In addition to these 93 extant fold switchers, we included the bacterial cell-division protein MinE19, 20, SARS-CoV-2 ORF9b21, and the human apoptosis regulator BAX22, which have all been shown to switch folds, as well as 3 KaiB homologs presumed to switch folds since they come from cyanobacterial strains similar to S. elongatus.

2.2. JPred4 predictions of extant fold switchers

All amino acid sequences from 99 extant fold switchers with solved structures were downloaded from the Protein Data Bank (PDB) and saved as individual FASTA files. JPred4 predictions were run remotely using a publicly downloadable scheduler available on the JPred4 website (http://compbio.dundee.ac.uk/jpred/), and jnetpred predictions were used for all calculations. Jnetpred maximizes accuracy by combining sequence profiles from HMMer23 and PSI-BLAST24, and we found previously that it identifies fold switchers more robustly than other secondary structure predictors12. Each residue was assigned one of three secondary structures: “H” for helix, “E” for extended β-strand, and “C” for coil. Chain breaks were annotated “-”. PDB IDs and chains of each fold-switched pair, as well as their FSR boundaries, are reported in Table S1. FSR boundaries were initially chosen based on the regions reported previously (bold sequences in Table S2 of1). PimA, KaiB, and RfaH were shortened to yield secondary structure prediction discrepancies, and an additional 11 residues were also added to the N-terminal end of PimA’s FSR. Such modifications seemed reasonable since JPred4 makes predictions based on a 20-residue window25 that it could use to associate an isolated fragment with its contextualized secondary structure prediction. Thus, modifying short stretches of N- and C-terminal sequence could decrease the association between isolated sequences and their contextualized predictions.

2.3. Observed secondary structure discrepancies

Secondary structure classifications of the 93 extant fold switchers were taken from1, and classifications of the three KaiB variants were presumed to be the same as S. elongatus KaiB. Classifications of MinE, ORF9b, and BAX were determined using DSSP26. To quantify secondary structure difference, FSR sequences were aligned with their parents using Biopython27 pairwise2.align.localxs with gap open/extension penalties of −1.0/−0.5. Secondary structure classifications in the same register as the aligned FSR sequences were extracted from both experimentally determined structures. Helix <-> strand discrepancies between the classifications were summed residue-by-residue (1 for discrepancy, 0 for no discrepancy) and normalized by FSR length. Pearson correlations were calculated using the corcoef function from Numpy28, and linear fits were determined using Scipy29 stats.linregress. Our benchmark set was selected by maximizing:

TP2Total,

where TP is the number of true positives and Total is the total number of proteins (true positives+false negatives. Since all 99 proteins switch folds, correct predictions were true positives and incorrect ones were false negatives.)

2.4. Single-fold proteins and fragments

Proteins expected not to switch folds and having fewer than 800 residues (the upper limit in JPred4), totaling 211, were taken from Table S3C of1. One segment was selected from a random region of each protein. Segment lengths were randomly selected from a distribution of FSR lengths ranging from 20–41, the range of lengths in our benchmark set. Random selections were performed using the random module of Python2.7. JPred4 was run on all 422 sequences (211 full sequences + 211 segments) using its mass-submit scheduler (http://www.compbio.dundee.ac.uk/jpred4/api.shtml#massSubmit).

2.5. Helix <-> strand discrepancies and distribution

Sequences of isolated FSRs were aligned with full-length proteins using the pairwise2.align.localxs function from Biopython27 with gap open/extension penalties of −1.0/−0.5. Secondary structure predictions were re-registered according to the resulting alignments and compared. Helix <-> strand discrepancies between the predictions were summed residue-by-residue (1 for discrepancy, 0 for no discrepancy) and normalized by FSR length. An overall view of our predictive method (sections 2.22.5) is shown in Scheme 1.

Scheme 1.

Scheme 1.

Summary of predictive approach.

2.6. Distributions and statistics

The distributions in Figures 1 and 3 were generated with Matplotlib30. Matthews Correlation Coefficients31 were calculated as follows:

TP×TNFP×FNTP+FPTP+FN(TN+FP)(TN+FN),

where TP = number of true positives, TN = number of true negatives, FP = number of false positives, and FN is the number of false negatives.

Figure 1.

Figure 1.

Helix <-> strand discrepancies predicted by JPred4 correspond to experimentally observed α-helix <->β-strand differences in fold-switching regions. Dotted line represents best linear fit of all datapoints (black and red circles; Pearson correlation: 0.82). Red circles correspond to benchmark set of 17 fold switchers. Only 16 can be observed because two KaiB variants (4KSO and 1WWJ) overlap exactly at (0.31, 0.26).

2.7. Chameleon sequences

All 8-residue chameleon sequences (stringent criterion) with non-homologous sequences from the ChSeq32 database were tested for fold switching. Since JPred4 cannot predict the secondary structures of sequences so short, we extracted 30-residue (mean FSR length of 28 rounded to the nearest multiple of 5) fragments from their parents centered on the chameleon sequences (or as close as possible if the sequences were near termini). JPred4 was then run on all fragments and whole sequences using the mass-submit scheduler. Predictions of whole sequences and fragments were compared as in 2.5.

3. Results

3.1. JPred4 predicts fold switchers that undergo α-helix <->β-strand transitions

We sought to determine whether JPred4 can identify fold-switching regions (FSRs) of extant fold switchers. To do this, JPred4 predictions of isolated FSR sequences and FSRs within their parent sequences (hereafter called contextualized FSRs, Methods) were compared for 99 experimentally validated fold switchers. A moderate Pearson correlation (0.67) was observed between predicted and experimentally observed α-helix <->β-strand discrepancies (Figure S1), indicating that JPred4 can identify some fold switchers that undergo α-helix <->β-strand transitions. False positives with no observed α<->β transitions were eliminated by removing fragments with high levels of predicted coil (≥65%), improving the overall correlation substantially (0.82, Figure 1). Together, these results indicate that our method can effectively identify some fold switchers that undergo α-helix <->β-strand transitions, but not fold switchers that undergo other types of secondary structure transitions, such as shifts in β-sheet register.

3.2. Extant fold switchers with sizeable α-helix <-> β-sheet transitions

We selected a benchmark set of 17 fold switchers by determining the fraction of observed α<-> β discrepancies that maximized both the percentage and the total number of true positives (Methods, fraction = 0.18, Figure 1, Figure S2). Ten members of this set are highlighted briefly in Figure 2, and all are reported in Table S2:

  • Selecase (“selective and specific caseinolytic metallopeptidase”; Figure 2A), produced by archaea and bacteria, and most studied from the archaeon Methanocaldococcus jannaschii, is an active metallopeptidase in its monomeric form. Upon forming structured higher order oligomers, namely dimers, tetramers, and octamers, Selecase is inactivated33. Its structures and activities are regulated by its concentration: mostly monomers at 0–0.3 mg/ml; dimers at 0.3–2 mg/ml; tetramers at 2–6 mg/ml, and octamers at > 6 mg/ml.

  • RfaH (Figure 2B) regulates the expression of virulence proteins from enterobacteria such as Escherichia coli34. It has two domains: an N-terminal NGN-binding domain (NTD) and a C-terminal domain (CTD) that switches folds. RfaH’s CTD folds into an α-helical bundle that forms a binding interface with the NTD, masking its RNA polymerase (RNAP) binding site. Upon binding both RNAP and a specific DNA consensus sequence, called ops, the CTD dissociates from the NTD, unmasking the NTD’s RNAP binding site. This binding event also triggers the CTD to reversibly refold into a β-barrel able to bind the integral S10 unit of the ribosome and foster efficient translation35. When expressed in isolation, RfaH’s CTD folds into a β-barrel with no trace of α-helical content (green structure)35.

  • PimA (Figure 2C) is a membrane-associated bacterial glycosyltransferase (phosphatidyl-myo-inositol mannosyltransferase) that initiates the biosynthesis of virulence factors produced by Mycobacterium tuberculosis. This enzyme has both a closed GDP-bound form and an open form with reshuffled secondary structure. PimA’s FSR is highly conserved in mycobacterial orthologs, and both crystallographic and near-UV CD evidence indicate that its open form could play an important role in membrane interactions36.

  • KaiB (Figure 2D) is a major component of the cyanobacterial circadian clock of Synechococcus elongatus6. Unlike most other circadian clocks, which are driven by transcription-translation oscillation, the cyanobacterial circadian clock is maintained through a periodic phosphorylation cycle, known as a post-translational oscillator (PTO)37. At night, KaiB’s active monomeric form helps to regulate the dephosphorylation of the PTO, while in the morning it primarily populates an inactive tetramer with a different fold, allowing phosphorylation of the PTO.

  • Ovalbumin (Figure 2E) is a member of the serpin family (serine protease inhibitor; although ovalbumin is not known to have in situ inhibitory activity – it constitutes 60–65% of egg whites and appears to be a storage protein38) with a zymogenic form (i.e., an inactive precursor, as has plasmepsin). Specifically, inactive ovalbumin has a reactive center loop (RCL) that, when cleaved by a serine protease such as subtilisin, forms a β-strand inserted between two pairs of β-hairpins on its surface. Additionally, the α-helix formed by ovalbumin’s uncleaved RCL is regular and less flexible than the distorted helices of inhibitory serpins39.

  • MinE (Figure 2F) is part of a three-component protein oscillator that helps to regulate bacterial cell division19. In its resting state, MinE forms a homodimer with six β-strands (3 from each monomer) and 4 α-helices (two from each monomer). When bound to MinD, another component of the oscillator, MinE’s two central β-strands are extruded from its dimer interface and refold into helices that bind MinD20, stimulating MinD’s ATPase activity and leading to membrane release.

  • ORF9b (Figure 2G) is from the genome of the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Expressed in isolation, it forms a homodimer composed of β-sheets. When bound to human Tom70, however, it refolds into an α-helix with one of two possible cellular effects21: (1) modulating interferon and apoptosis signaling or (2) decreasing mitochondrial import efficiency, leading to mitophagy. JPred4 has been used previously to predict ORF9b’s fold switching40.

  • The human amyloid-forming proteins α-Synuclein and amyloid β (Figures 2H & I, respectively), along with amylin (Table S2), are all believed to interact with membranes, where they form α-helices4143. While the cognate functions of helical α-synuclein and amyloid β remain under investigation, amylin is an endocrine hormone (co-secreted with insulin) that regulates glycemic metabolism42. All three peptides can also form fibrillar deposits associated with diseases such as Parkinson’s (α-Synuclein)44, type 2 diabetes (amylin)45, and Alzheimer’s (amyloid β)46.

  • BAX is a human protein involved in mitochondrial apoptosis. It assumes an all α-helical fold in the cytosol, and membrane insertion of its C-terminal helix appears to foster its apoptotic function. Several lines of experimental evidence (e.g. mass spectrometry, electron microscopy, and circular dichroism spectroscopy) indicate that BAX refolds into β-sheet fibrils when bound to the humanin peptide22. Furthermore, light-scattering experiments demonstrate that its C-terminal helix propagates fibril formation22. This refolding is believed to sequester BAX, preventing it from initiating mitochondrial apopotosis.

Figure 2.

Figure 2.

JPred4 predicts different secondary structures for isolated and contextualized FSRs of extant fold switchers with substantive transitions between α-helix and β-strand. Each panel shows the experimentally determined secondary structures of both conformations of the fold switcher (purple and green) along with JPred4 secondary structure predictions of the whole sequence (black) and FSR (gray). Purple and green regions of protein structure correspond to FSR sequence shown in diagram; gray correspond to Structurally Constant Regions (SCRs). Predicted secondary structures that were at least 2 contiguous residues long are shown. The KaiB variant (2QKE) represents all members of the KaiB family; the other 3 (1WWJ, 4SKO, 1R5P) are not shown; amylin is also not shown due to lack of space. The differential secondary structure predictions for ORF9b were reported previously40. The green secondary structure diagram of BAX is shaded with lines to signify that its structure has not been solved, though other experimental evidence strongly suggests that it folds into a β-sheet. All three-dimensional protein structures were made using PyMOL.

3.3. JPred4 discriminates between FSRs and false positives with statistical significance

In all cases shown in Figure 2, along with 5/7 of the other proteins in our benchmark set (Table S2), we found that JPred4 predicted different secondary structures for isolated and contextualized FSR sequences. JPred4 secondary structure predictions tend to correspond reasonably well (<6% α-helix <-> β-strand discrepancies9) with at least one experimentally determined protein structure for all 14 proteins. In fact, in all 17 cases, α-helix and β-strand secondary structure elements correspond well between one prediction and one experimentally determined conformation (correct secondary structures in all of the right positions, though not necessarily the experimentally determined lengths). However, the alternative JPred4 predictions generally do not correspond well with the alternative secondary structure prediction, except for PimA and KaiB. Nevertheless, as in previous work12, we use discrepancies between predictions to infer fold switching; for our purposes the accuracies of the JPred4 predictions have no bearing on this inference.

3.3. JPred4 discriminates between FSRs and single-folding regions

To determine the significance of JPred4’s α-helix <-> β-strand prediction discrepancies for isolated and contextualized FSRs, we randomly selected fragments from a set of 211 proteins expected not to switch folds (single-fold proteins). Upon eliminating all predictions with ≥65% coil, 136 predictions remained.

Predictions of single folders and fold switchers are compared in Figure 3. We noticed that 11/17 of the fold-switching proteins in our benchmark set had predicted helix<->strand discrepancies ≥20%, while only 2/136 of single folders had helix<->strand discrepancies at the same threshold. One of these false positives comprised residues 12–48 from the glutathione S-transferase (GST) Omega 3 expressed by the silkworm Bombyx mori. Residue 29 of this segment is an asparagine, which replaces a highly conserved cysteine in the other members of the Omega family47. This single amino acid change is partially responsible for Omega 3’s loss of GST activity: mutating asparagine 29 to a cysteine while also deleting its flexible C-terminal helix restores GST activity. Interestingly, running JPred4 on the same segment (residues 12–48) with just an N29C mutation gives the same secondary structure prediction as that of the whole protein (Table S3)47. Based on our previous work on sequence-similar fold switchers12, this result suggests that this protein segment might switch folds and thus might not be a false positive after all. The other false positive comprised residues 93–142 of Bd3460, a self-protection protein from Bdellovibrio bacteriovorus that assumes an ankyrin-like fold48. No obvious reason for the fold switching misclassification was identified.

Figure 3.

Figure 3.

JPred4 discriminates between single folders and fold switchers. Single folders/fold switchers are blue circles/red triangles. The dashed line represents the threshold for classifying fold switchers by fraction of predicted secondary structure/fraction of α-helix <-> β-strand discrepancies (0.2). Datapoints at or above this threshold are predicted to switch folds. Only 16/17 fold switchers can be seen because two KaiB variants have identical coordinates (0.37,0.26).

At a 20% threshold for predicted α-helix<->β-strand discrepancies, our method yielded 11 true positives, 2 false positives, 134 true negatives, and 6 false negatives, resulting in a Matthews Correlation Coefficient of 0.71 (very low false positive rate; moderate false negative rate). In 4/6 false negatives, α-helix <->β-strand discrepancies were predicted, but they were not large enough to exceed the 20% threshold for the classifier. JPred4 may have misclassified the six false negatives for two reasons. Firstly, we suspect that the sequence profiles generated for FSRs and whole proteins were similar, leading to identical JPred predictions. Secondly, database population may have played a role in the misclassification. Specifically, sequences associated with one fold may have been more highly represented than sequences associated with the other.

3.4. JPred4 does not systematically classify chameleon sequences as fold switchers

To further test the robustness of our classifier, we ran our JPred4-based method on 45 nonhomologous chameleon sequences from the ChSeq database32. Chameleon sequences are identical sequences that assume α-helices in some proteins and β-strands in others but are not associated with fold switching49. Of the 36 sequences with <65% coil predicted, 5 were classified as putative fold switchers (Table S4). Thus, while our method sometimes misclassifies chameleon sequences as fold switchers, it is not a systematic defect.

4. Discussion

Fold switchers are exceptions to the observation that folded proteins assume one stable structure that performs one function. Nevertheless, increasing evidence suggests that these proteins may be more abundant in nature than previously thought1. Fold switching impacts protein function5 and is associated with multiple diseases2, 3, 50. Thus, it would be useful to have a bioinformatic algorithm that identifies more fold switchers from their sequences. This is especially true because, up to this point, all experimentally characterized fold switchers have been stumbled upon by chance1.

Here we present an approach for predicting extant fold switchers from their amino acid sequences alone. This method is based on previous experimental work suggesting that the fold-switching regions (FSRs) of proteins are context-dependent: that is, their conformations are determined by their environment17, 18. In light of this, we hypothesized that it might be possible to predict extant fold switchers by comparing the JPred4 secondary structure predictions of isolated FSRs with contextualized FSRs and searching for α-helix <-> β-strand discrepancies. Indeed, significant discrepancies were found in 11/17 fold switchers used in this study. We used this finding to develop a classifier for extant fold switchers that yielded a Matthews Correlation Coefficient of 0.71. We suspect that JPred4 successfully identified extant fold switchers for the same reason it identified sequence-similar fold switchers12: different sequences (contextualized and isolated FSRs in this case) yielded different sequence profiles from PSI-BLAST searches. Future work revealing how these different profiles lead to dramatically different secondary structure predictions would be useful.

Two additional results stand out in light of our previous method12, which predicts evolved fold switchers with highly similar sequences. First, the method presented here predicts fold switching in all four KaiB variants tested. This positive result is an improvement over our previous method for sequence-similar fold switchers, which failed to predict fold switching in all KaiB variants12. Secondly, our results strongly suggest that the fragment from Omega 3 is an FSR, even though it was in our set of proteins not expected to switch folds. Just one mutation (N29C) is sufficient to dramatically change the secondary structure predictions of this sequence, a previously identified characteristic of sequence-similar fold switchers (proteins with highly similar—but not identical—amino acid sequences and different folds12). Additionally, Omega 3’s GST topology47 has been known to switch folds in other proteins, namely KaiB51 and Chloride Intracellular Channel 1 (CLIC1)52. Still, further experimental work would be needed to determine whether Omega 3 switches folds.

Although we are optimistic that the approach presented here can be used to predict novel fold switchers, it has several limitations. Firstly, it can only identify fold switchers that undergo large α-helix <-> β-sheet transitions. To date, these proteins are rare and comprise only 17% of known fold switchers. Biologically important fold switchers like lymphotactin53, which maintains β-sheets that change their hydrogen bonding register, and most β-pore proteins54, which extend already existing β-sheet structures, will be missed. Secondly, it will not identify all fold switchers that undergo large α-helix <-> β-sheet transitions, as evidenced by the fact that only 11/17 of the fold switchers tested gave a robust enough signal to be classified positively. Thirdly, because the FSRs of undiscovered fold switchers are not known a priori, our method will likely need to test many putative FSRs (different sizes and different regions) within the same protein to determine whether or not it is a fold switcher. Although this approach is much less computationally intensive than all-atom simulations, it will still require substantial time and computational power to predict fold switching in thousands of genomic sequences. Furthermore, the more sequences probed, the more likely false positives will be hit. Additional work will be needed to more accurately distinguish between these false positives and true fold switchers. Finally, our training set was small, comprising the only 17 known fold switchers suitable for the predictive method presented here. Thus, it is likely that our statistics, especially for true positives and false negatives, are noisy. As more fold switchers are discovered, we are optimistic that it will be possible to develop methods that can predict more types of fold switchers with higher accuracy.

5. Conclusions

Our results suggest that the α-helix <-> β-strand transitions of some extant fold switchers can be predicted from their sequences alone using the homology-based secondary structure predictor JPred4. Although this method will not identify all extant fold switchers whose secondary structures transition from α-helix <-> β-strand, its low false positive (2/136) and moderate true positive (11/17) rates suggest that many positive predictions will likely correspond to true extant fold switchers. Thus, we are optimistic that this approach can be used to predict a subset of extant fold switchers from the broad base of available genomic sequences.

Supplementary Material

supinfo2
supinfo1

Acknowledgements

This work utilized the computational resources of the NIH HPS Biowulf cluster (http://hpc.nih.gov). This work was supported in part by the Intramural Research Program of the National Library of Medicine, National Institutes of Health.

Footnotes

Conflict of Interest Statement

The authors declare no conflict of interest.

Data availability

Data and code that support the findings of this study are openly available on GitHub at https://github.com/ncbi/extant_fold_switchers

References

  • 1.Porter LL; Looger LL, Extant fold-switching proteins are widespread. Proc Natl Acad Sci U S A 2018, 115 (23), 5968–5973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Li BP; Mao YT; Wang Z; Chen YY; Wang Y; Zhai CY; Shi B; Liu SY; Liu JL; Chen JQ, CLIC1 Promotes the Progression of Gastric Cancer by Regulating the MAPK/AKT Pathways. Cell Physiol Biochem 2018, 46 (3), 907–924. [DOI] [PubMed] [Google Scholar]
  • 3.Lei Y; Takahama Y, XCL1 and XCR1 in the immune system. Microbes Infect 2012, 14 (3), 262–7. [DOI] [PubMed] [Google Scholar]
  • 4.Jain V; Kikuchi H; Oshima Y; Sharma A; Yogavel M, Structural and functional analysis of the anti-malarial drug target prolyl-tRNA synthetase. J Struct Funct Genomics 2014, 15 (4), 181–90. [DOI] [PubMed] [Google Scholar]
  • 5.Kim AK; Porter LL, Functional and Regulatory Roles of Fold-Switching Proteins. Structure 2021, 29 (1), 6–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Chang YG; Cohen SE; Phong C; Myers WK; Kim YI; Tseng R; Lin J; Zhang L; Boyd JS; Lee Y; Kang S; Lee D; Li S; Britt RD; Rust MJ; Golden SS; LiWang A, Circadian rhythms. A protein fold switch joins the circadian oscillator to clock output in cyanobacteria. Science 2015, 349 (6245), 324–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kang JY; Mooney RA; Nedialkov Y; Saba J; Mishanina TV; Artsimovitch I; Landick R; Darst SA, Structural Basis for Transcript Elongation Control by NusG Family Universal Regulators. Cell 2018, 173 (7), 1650–1662 e14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Berman HM; Battistuz T; Bhat TN; Bluhm WF; Bourne PE; Burkhardt K; Feng Z; Gilliland GL; Iype L; Jain S; Fagan P; Marvin J; Padilla D; Ravichandran V; Schneider B; Thanki N; Weissig H; Westbrook JD; Zardecki C, The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 2002, 58 (Pt 6 No 1), 899–907. [DOI] [PubMed] [Google Scholar]
  • 9.Mishra S; Looger LL; Porter LL, Inaccurate secondary structure predictions often indicate protein fold switching. Protein Sci 2019, 28 (8), 1487–1493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ambroggio XI; Kuhlman B, Computational design of a single amino acid sequence that can switch between two distinct protein folds. J Am Chem Soc 2006, 128 (4), 1154–61. [DOI] [PubMed] [Google Scholar]
  • 11.Wei KY; Moschidi D; Bick MJ; Nerli S; McShan AC; Carter LP; Huang PS; Fletcher DA; Sgourakis NG; Boyken SE; Baker D, Computational design of closely related proteins that adopt two well-defined but structurally divergent folds. Proc Natl Acad Sci U S A 2020, 117 (13), 7208–7215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kim AK; Looger LL; Porter LL, A high-throughput predictive method for sequence-similar fold switchers. Biopolymers 2021, e23416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Tian P, Best RB, Exploring the sequence fitness landscape of a bridge between two protein folds. PLoS Comput Biol 2020, 16 (10), e1008285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sfriso P; Duran-Frigola M; Mosca R; Emperador A; Aloy P; Orozco M, Residues Coevolution Guides the Systematic Identification of Alternative Functional Conformations in Proteins. Structure 2016, 24 (1), 116–126. [DOI] [PubMed] [Google Scholar]
  • 15.Chen N; Das M; LiWang A; Wang LP, Sequence-Based Prediction of Metamorphic Behavior in Proteins. Biophys J 2020, 119 (7), 1380–1390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Huang IK; Pei J; Grishin NV, Defining and predicting structurally conserved regions in protein superfamilies. Bioinformatics 2013, 29 (2), 175–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Minor DL Jr.; Kim PS, Context-dependent secondary structure formation of a designed protein sequence. Nature 1996, 380 (6576), 730–4. [DOI] [PubMed] [Google Scholar]
  • 18.Porter LL; He Y; Chen Y; Orban J; Bryan PN, Subdomain interactions foster the design of two protein pairs with approximately 80% sequence identity but different folds. Biophys J 2015, 108 (1), 154–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Cai M; Huang Y; Shen Y; Li M; Mizuuchi M; Ghirlando R; Mizuuchi K; Clore GM, Probing transient excited states of the bacterial cell division regulator MinE by relaxation dispersion NMR spectroscopy. Proc Natl Acad Sci U S A 2019, 116 (51), 25446–25455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Park KT; Wu W; Battaile KP; Lovell S; Holyoak T; Lutkenhaus J, The Min oscillator uses MinD-dependent conformational changes in MinE to spatially regulate cytokinesis. Cell 2011, 146 (3), 396–407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gordon DE; Hiatt J; Bouhaddou M; Rezelj VV; Ulferts S; Braberg H; Jureka AS; Obernier K; Guo JZ; Batra J; Kaake RM; Weckstein AR; Owens TW; Gupta M; Pourmal S; Titus EW; Cakir M; Soucheray M; McGregor M; Cakir Z; Jang G; O’Meara MJ; Tummino TA; Zhang Z; Foussard H; Rojc A; Zhou Y; Kuchenov D; Huttenhain R; Xu J; Eckhardt M; Swaney DL; Fabius JM; Ummadi M; Tutuncuoglu B; Rathore U; Modak M; Haas P; Haas KM; Naing ZZC; Pulido EH; Shi Y; Barrio-Hernandez I; Memon D; Petsalaki E; Dunham A; Marrero MC; Burke D; Koh C; Vallet T; Silvas JA; Azumaya CM; Billesbolle C; Brilot AF; Campbell MG; Diallo A; Dickinson MS; Diwanji D; Herrera N; Hoppe N; Kratochvil HT; Liu Y; Merz GE; Moritz M; Nguyen HC; Nowotny C; Puchades C; Rizo AN; Schulze-Gahmen U; Smith AM; Sun M; Young ID; Zhao J; Asarnow D; Biel J; Bowen A; Braxton JR; Chen J; Chio CM; Chio US; Deshpande I; Doan L; Faust B; Flores S; Jin M; Kim K; Lam VL; Li F; Li J; Li YL; Li Y; Liu X; Lo M; Lopez KE; Melo AA; Moss FR 3rd; Nguyen P; Paulino J; Pawar KI; Peters JK; Pospiech TH Jr.; Safari M; Sangwan S; Schaefer K; Thomas PV; Thwin AC; Trenker R; Tse E; Tsui TKM; Wang F; Whitis N; Yu Z; Zhang K; Zhang Y; Zhou F; Saltzberg D; Consortium QSB; Hodder AJ; Shun-Shion AS; Williams DM; White KM; Rosales R; Kehrer T; Miorin L; Moreno E; Patel AH; Rihn S; Khalid MM; Vallejo-Gracia A; Fozouni P; Simoneau CR; Roth TL; Wu D; Karim MA; Ghoussaini M; Dunham I; Berardi F; Weigang S; Chazal M; Park J; Logue J; McGrath M; Weston S; Haupt R; Hastie CJ; Elliott M; Brown F; Burness KA; Reid E; Dorward M; Johnson C; Wilkinson SG; Geyer A; Giesel DM; Baillie C; Raggett S; Leech H; Toth R; Goodman N; Keough KC; Lind AL; Zoonomia C; Klesh RJ; Hemphill KR; Carlson-Stevermer J; Oki J; Holden K; Maures T; Pollard KS; Sali A; Agard DA; Cheng Y; Fraser JS; Frost A; Jura N; Kortemme T; Manglik A; Southworth DR; Stroud RM; Alessi DR; Davies P; Frieman MB; Ideker T; Abate C; Jouvenet N; Kochs G; Shoichet B; Ott M; Palmarini M; Shokat KM; Garcia-Sastre A; Rassen JA; Grosse R; Rosenberg OS; Verba KA; Basler CF; Vignuzzi M; Peden AA; Beltrao P; Krogan NJ, Comparative host-coronavirus protein interaction networks reveal pan-viral disease mechanisms. Science 2020, 370 (6521). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Morris DL; Kastner DW; Johnson S; Strub MP; He Y; Bleck CKE; Lee DY; Tjandra N, Humanin induces conformational changes in the apoptosis regulator BAX and sequesters it into fibers, preventing mitochondrial outer-membrane permeabilization. J Biol Chem 2019, 294 (50), 19055–19065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Potter SC; Luciani A; Eddy SR; Park Y; Lopez R; Finn RD, HMMER web server: 2018 update. Nucleic Acids Res 2018, 46 (W1), W200–W204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Altschul SF; Madden TL; Schaffer AA; Zhang J; Zhang Z; Miller W; Lipman DJ, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25 (17), 3389–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Drozdetskiy A; Cole C; Procter J; Barton GJ, JPred4: a protein secondary structure prediction server. Nucleic Acids Res 2015, 43 (W1), W389–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kabsch W; Sander C, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22 (12), 2577–637. [DOI] [PubMed] [Google Scholar]
  • 27.Cock PJ; Antao T; Chang JT; Chapman BA; Cox CJ; Dalke A; Friedberg I; Hamelryck T; Kauff F; Wilczynski B; de Hoon MJ, Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009, 25 (11), 1422–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Harris CR; Millman KJ; van der Walt SJ; Gommers R; Virtanen P; Cournapeau D; Wieser E; Taylor J; Berg S; Smith NJ; Kern R; Picus M; Hoyer S; van Kerkwijk MH; Brett M; Haldane A; Del Rio JF; Wiebe M; Peterson P; Gerard-Marchant P; Sheppard K; Reddy T; Weckesser W; Abbasi H; Gohlke C; Oliphant TE, Array programming with NumPy. Nature 2020, 585 (7825), 357–362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Virtanen P; Gommers R; Oliphant TE; Haberland M; Reddy T; Cournapeau D; Burovski E; Peterson P; Weckesser W; Bright J; van der Walt SJ; Brett M; Wilson J; Millman KJ; Mayorov N; Nelson ARJ; Jones E; Kern R; Larson E; Carey CJ; Polat I; Feng Y; Moore EW; VanderPlas J; Laxalde D; Perktold J; Cimrman R; Henriksen I; Quintero EA; Harris CR; Archibald AM; Ribeiro AH; Pedregosa F; van Mulbregt P; SciPy C, SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 2020, 17 (3), 261–272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hunter JD, Matplotlib: A 2D graphics environment. Comput Sci Eng 2007, 9 (3), 90–95. [Google Scholar]
  • 31.Matthews BW, Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 1975, 405 (2), 442–51. [DOI] [PubMed] [Google Scholar]
  • 32.Li W; Kinch LN; Karplus PA; Grishin NV, ChSeq: A database of chameleon sequences. Protein Sci 2015, 24 (7), 1075–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lopez-Pelegrin M; Cerda-Costa N; Cintas-Pedrola A; Herranz-Trillo F; Bernado P; Peinado JR; Arolas JL; Gomis-Ruth FX, Multiple stable conformations account for reversible concentration-dependent oligomerization and autoinhibition of a metamorphic metallopeptidase. Angew Chem Int Ed Engl 2014, 53 (40), 10624–30. [DOI] [PubMed] [Google Scholar]
  • 34.Burmann BM; Knauer SH; Sevostyanova A; Schweimer K; Mooney RA; Landick R; Artsimovitch I; Rosch P, An alpha helix to beta barrel domain switch transforms the transcription factor RfaH into a translation factor. Cell 2012, 150 (2), 291–303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zuber PK; Schweimer K; Rosch P; Artsimovitch I; Knauer SH, Reversible fold-switching controls the functional cycle of the antitermination factor RfaH. Nat Commun 2019, 10 (1), 702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Giganti D; Albesa-Jove D; Urresti S; Rodrigo-Unzueta A; Martinez MA; Comino N; Barilone N; Bellinzoni M; Chenal A; Guerin ME; Alzari PM, Secondary structure reshuffling modulates glycosyltransferase function at the membrane. Nat Chem Biol 2015, 11 (1), 16–8. [DOI] [PubMed] [Google Scholar]
  • 37.Partch CL, Orchestration of Circadian Timing by Macromolecular Protein Assemblies. J Mol Biol 2020, 432 (12), 3426–3448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Stein PE; Leslie AG; Finch JT; Carrell RW, Crystal structure of uncleaved ovalbumin at 1.95 A resolution. J Mol Biol 1991, 221 (3), 941–59. [DOI] [PubMed] [Google Scholar]
  • 39.Yamasaki M; Arii Y; Mikami B; Hirose M, Loop-inserted and thermostabilized structure of P1-P1’ cleaved ovalbumin mutant R339T. J Mol Biol 2002, 315 (2), 113–20. [DOI] [PubMed] [Google Scholar]
  • 40.Porter LL, Predictable fold switching by the SARS-CoV-2 protein ORF9b. Protein Sci 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Crescenzi O; Tomaselli S; Guerrini R; Salvadori S; D’Ursi AM; Temussi PA; Picone D, Solution structure of the Alzheimer amyloid beta-peptide (1–42) in an apolar microenvironment. Similarity with a virus fusion domain. Eur J Biochem 2002, 269 (22), 5642–8. [DOI] [PubMed] [Google Scholar]
  • 42.Patil SM; Xu S; Sheftic SR; Alexandrescu AT, Dynamic alpha-helix structure of micelle-bound human amylin. J Biol Chem 2009, 284 (18), 11982–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Rao JN; Jao CC; Hegde BG; Langen R; Ulmer TS, A combinatorial NMR and EPR approach for evaluating the structural ensemble of partially folded proteins. J Am Chem Soc 2010, 132 (25), 8657–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Walti MA; Ravotti F; Arai H; Glabe CG; Wall JS; Bockmann A; Guntert P; Meier BH; Riek R, Atomic-resolution structure of a disease-relevant Abeta(1–42) amyloid fibril. Proc Natl Acad Sci U S A 2016, 113 (34), E4976–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Cao Q; Boyer DR; Sawaya MR; Ge P; Eisenberg DS, Cryo-EM structure and inhibitor design of human IAPP (amylin) fibrils. Nat Struct Mol Biol 2020, 27 (7), 653–659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Tuttle MD; Comellas G; Nieuwkoop AJ; Covell DJ; Berthold DA; Kloepper KD; Courtney JM; Kim JK; Barclay AM; Kendall A; Wan W; Stubbs G; Schwieters CD; Lee VM; George JM; Rienstra CM, Solid-state NMR structure of a pathogenic fibril of full-length human alpha-synuclein. Nat Struct Mol Biol 2016, 23 (5), 409–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Chen BY; Ma XX; Guo PC; Tan X; Li WF; Yang JP; Zhang NN; Chen Y; Xia Q; Zhou CZ, Structure-guided activity restoration of the silkworm glutathione transferase Omega GSTO3–3. J Mol Biol 2011, 412 (2), 204–11. [DOI] [PubMed] [Google Scholar]
  • 48.Lambert C; Cadby IT; Till R; Bui NK; Lerner TR; Hughes WS; Lee DJ; Alderwick LJ; Vollmer W; Sockett RE; Lovering AL, Ankyrin-mediated self-protection during cell invasion by the bacterial predator Bdellovibrio bacteriovorus. Nat Commun 2015, 6, 8884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Mezei M, Foldability and chameleon propensity of fold-switching protein sequences. Proteins 2020. [DOI] [PubMed] [Google Scholar]
  • 50.Jain V; Yogavel M; Oshima Y; Kikuchi H; Touquet B; Hakimi MA; Sharma A, Structure of Prolyl-tRNA Synthetase-Halofuginone Complex Provides Basis for Development of Drugs against Malaria and Toxoplasmosis. Structure 2015, 23 (5), 819–829. [DOI] [PubMed] [Google Scholar]
  • 51.Tseng R; Goularte NF; Chavan A; Luu J; Cohen SE; Chang YG; Heisler J; Li S; Michael AK; Tripathi S; Golden SS; LiWang A; Partch CL, Structural basis of the day-night transition in a bacterial circadian clock. Science 2017, 355 (6330), 1174–1180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Littler DR; Harrop SJ; Fairlie WD; Brown LJ; Pankhurst GJ; Pankhurst S; DeMaere MZ; Campbell TJ; Bauskin AR; Tonini R; Mazzanti M; Breit SN; Curmi PM, The intracellular chloride ion channel protein CLIC1 undergoes a redox-controlled structural transition. J Biol Chem 2004, 279 (10), 9298–305. [DOI] [PubMed] [Google Scholar]
  • 53.Dishman AF; Tyler RC; Fox JC; Kleist AB; Prehoda KE; Babu MM; Peterson FC; Volkman BF, Evolution of fold switching in a metamorphic protein. Science 2021, 371 (6524), 86–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Podobnik M; Savory P; Rojko N; Kisovec M; Wood N; Hambley R; Pugh J; Wallace EJ; McNeill L; Bruce M; Liko I; Allison TM; Mehmood S; Yilmaz N; Kobayashi T; Gilbert RJ; Robinson CV; Jayasinghe L; Anderluh G, Crystal structure of an invertebrate cytolysin pore reveals unique properties and mechanism of assembly. Nat Commun 2016, 7, 11598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supinfo2
supinfo1

RESOURCES