If knowledge regarding proteins were restricted to what is presented in contemporary biochemistry textbooks, one would think that all proteins adopt the three-dimensional structures encoded by their amino-acid sequences, and that each structure in turn provides the basis for each protein’s function. This sequence-to-structure-to-function hypothesis for protein structure-function relationships is highly ingrained in contemporary thinking about proteins.
In contrast to the ingrained view of sequence-to-structure-to-function, many proteins have been experimentally characterized as lacking specific structure or as containing regions that lack specific structure. This is not to say that the sequence-to-structure-to-function hypothesis is wrong—this hypothesis is very likely true for most enzymes, transmembrane proteins, and proteins that bind small molecules—but this is to say that the sequence-to-structure-to-function hypothesis is incomplete. Whereas the purpose of this article is to highlight one of these nonfolding proteins, namely the Micro Exon Gene 14 (MEG-14) protein from Schistosomiasis (1), it is useful to provide a brief background for these structure-lacking proteins and regions.
These intrinsically disordered proteins (IDPs) or IDP regions exist as interconverting conformational ensembles, and some confusion about them exists because they have been identified by many different terms (2). IDPs and IDP regions are interesting because they are associated with biological functions such as providing flexible linkers between structured domains or for containing sites for protein-protein interactions, sites for DNA binding, sites for tRNA binding, sites for mRNA binding, sites for rRNA binding, sites for metal ion binding, and sites for posttranslational modifications such as phosphorylation, methylation, acetylation, myristoylation, palmitoylation, and ADP-ribosylation, as well as sites for many additional functions (3).
Many researchers pointed out that protein folding is driven by the burial of hydrophobic side chains. Studies to better understand hydrophobic side-chain burial using simplified lattice models led to the suggestion that sequences with very polar compositions would simply not fold, whereas sequences with appropriate mixtures of polar and nonpolar residues would fold, but with different structures arising from different arrangements of the amino-acid residues (4). Simply put, sequence codes for three-dimensional structure, or if too polar, sequence codes for lack of folding. None of the early articles on protein structure and folding, however, contained suggestions that polar, nonfolding sequences might exist in nature nor did they contain suggestions that such nonfolded proteins might carry out biological functions. Everything was discussed in terms of the ingrained concept that function follows folding.
Not fitting with the hypothesis of function from structure, from the 1950s to the 1990s there were numerous reports of proteins or protein regions that lacked three-dimensional structure yet carried out biological functions (3), but these various reports mostly treated each example as an isolated instance. Studying such examples as a group led us to ask the following question: what causes IDPs and IDP regions to fail to fold into three-dimensional structures? The concept that hydrophobic side-chain burial drives protein folding combined with the earlier seminal reference (4) suggested a possible answer, namely that an excess of polar residues and a dearth of nonpolar ones might explain the observed lack of folding. To test this possibility, we compared the compositions of proteins and regions shown to remain unfolded with the compositions of proteins that fold into three-dimensional structures. The findings were clear: the nonfolding proteins and regions exhibited significantly higher polarity than the structured proteins and regions (5). In addition, these compositional differences were used as inputs for algorithms to predict whether a protein would remain disordered or would fold into three-dimensional structures (6).
Disorder prediction has become a regular feature of the Critical Assessment of (Protein) Structure Prediction (CASP) Meetings (sponsored by the National Institute of General Medical Sciences, Bethesda, MD; http://www.nigms.nih.gov/), leading to a substantial growth in the number of predictors. Although these CASP experiments suffer from the small size of their IDP datasets and have an excess of short disordered regions, these experiments in CASP have the significant advantage that the predictions are carried out in a truly blind fashion, with prediction accuracies in recent CASP experiments falling in the 75–83% range (see http://predictioncenter.org), where the accuracy (ACC) is given as
An 86-residue central fragment from residue number 27–112 of the MEG-14 protein from Schistosomiasis is shown to be an IDP region using three high-quality disorder predictors and using the experimental method of circular-dichroism spectroscopy (1). Although the predictions of disorder suggest that the amino-acid sequence of the MEG-14 segment is polar, it is nevertheless interesting to visually compare the MEG-14 protein’s sequence with the sequences of two classical structured proteins, the α-helix-rich sperm whale myoglobin and the β-sheet-rich hen’s egg lysozyme (Fig. 1). For this comparison, the same residue numbers were arbitrarily chosen in all three sequences. In this figure, the residues are color-coded according to whether they are order-promoting (blue), in-between (green), or disorder-promoting (red) (7). The MEG-14 segment (Fig. 1 A) contains just four order-promoting hydrophobic residues and 12 prolines as compared to 21 and 26 order-promoting hydrophobic residues for myoglobin and lysozyme, respectively, and just two prolines each for the structure-forming segments (Fig. 1, B and C). With so few hydrophobic residues and so many structure-breaking prolines, it is no wonder that the MEG-14 segment does not fold into a specific structured domain. Indeed, the disorder prediction for MEG-14 is confirmed by its circular-dichroism spectra (1), thus fulfilling the expectation that highly polar sequences should not fold (4). Note that proline, due to the absence of a proton on its backbone nitrogen, behaves like a polar residue despite its hydrophobic side chain.
Figure 1.

Sequence comparisons of MEG-14 with two structured proteins. The 86-residue fragments from numbers 27–112 are given for (A) MEG-14 protein, (B) sperm whale myoglobin, and (C) hen’s egg lysozyme. The amino acids are color-coded (D) from values determined in Campen et al. (7), and the structure-breaking prolines, P, are underlined. The ranking of the amino acids shown in panel D was determined by finding the optimum residue values for separating experimentally determined ordered and disordered segments. The values for order-promoting range from –0.884 to –0.121, the values for in-between residues range from +0.007 to +0.060, and the values for disorder-promoting residues range from +0.166 to +0.987. The experimental design of this work (7) called for order-promoting residues to have values between –1 and 0 and disorder-promoting residues to have values between 0 and +1, with the residue assignments to these categories arising from an amino-acid scale that gave the best separation between experimentally determined ordered and disordered protein regions. This convention was chosen so that these values would be like hydropathy scales in having positive values for hydrophilic residues and negative values for hydrophobic residues. Given their slightly positive values as indicated above, the in-between residues could be considered instead to be weakly disorder-promoting. It should be kept in mind that these values could change slightly if the same methods were to be applied to other datasets of structured and disordered protein segments.
The MEG-14 protein exhibits chameleon-like conformational changes depending on the environment (1). In this regard, MEG-14 resembles α-synuclein, an intensively studied IDP (8). Even the MEG-14 protein’s formation of helix upon dehydration (1) has been observed for other IDPs (9). The chameleon descriptor has been applied to certain amino-acid sequences and to several other proteins that might or might not be IDPs. Nevertheless, the shape-changing capacity of IDPs makes “chameleon” an appropriate descriptor for this group of proteins.
Another interesting feature of the MEG-14 protein is that, although its apparent molecular weight by sodium dodecyl-sulfate gel electrophoresis is slightly over 12,000 Da (1), its estimated size is much smaller at ∼8640 Da. Such aberrantly high apparent molecular weights by sodium dodecyl-sulfate gel electrophoresis are commonly observed for proteins with IDP regions (10) or in this case, for a protein that is wholly an IDP.
Without regard to the disorder concept, it was previously suggested that the high sequence variability for MEG-14 and a wide variety of MEG protein sequences likely contribute to this parasite’s escape from immune detection, with infection of the human host lasting up to 30 years or more (11). This is not the first time that one or more disordered proteins have been implicated in helping a parasite avoid its host’s immune system. Similarly to Schistosomiasis, the various malaria-causing Plasmodia also evidently use highly variable intrinsically disordered proteins to avoid immune detection (12). Thus, the two most devastating parasites in the world, Plasmodia and Schistosomiasis, both evidently escape immune detection using strategies based on IDPs. A third parasite suggested to use an IDP to escape immune detection is Staphylococcus. This organism uses a surface protein with a large IDP region to bind to cell matrix proteins, and yet this surface protein is not a very good antigen likely because of its high flexibility (3). It is extremely interesting that these three widely divergent parasites all evidently use IDP-based strategies to avoid detection by their hosts’ defense systems.
What are the biophysical or biochemical mechanisms underlying the use of IDPs to avoid their hosts’ immune systems by Schistosomiasis, Plasmodia, and Staphylococcus? Do they use a common mechanism or are different mechanisms employed? These questions provide the basis for important and interesting future research on IDPs.
Acknowledgments
A.K.D.’s research has been supported by National Institutes of Health grants No. R01LM007699 and R01GM071714 and by National Science Foundation grant No. EF 0849803.
References
- 1.Lopes J.L.S., Orcia D., Wallace B.A. Folding factors and partners for the intrinsically disordered protein micro-exon gene 14 (MEG-14) Biophys. J. 2013;104:2512–2520. doi: 10.1016/j.bpj.2013.03.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Dunker A.K., Babu M.M., Uversky V.N. What’s in a name? Why these proteins are intrinsically disordered. Intrins. Disord. Proteins. 2013;1:e24157. doi: 10.4161/idp.24157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Dunker A.K., Iakoucheva L.M., Obradovic Z. Intrinsic disorder and protein function. Biochemistry. 2002;41:6573–6582. doi: 10.1021/bi012159+. [DOI] [PubMed] [Google Scholar]
- 4.Shakhnovich E.I., Gutin A.M. Engineering of stable and fast-folding sequences of model proteins. Proc. Natl. Acad. Sci. USA. 1993;90:7195–7199. doi: 10.1073/pnas.90.15.7195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Xie Q., Arnold G.E., Dunker A.K. The sequence attributes method for determining relationships between sequence and protein disorder. Gen. Inform. 1998;9:193–200. [PubMed] [Google Scholar]
- 6.Romero P., Obradovic Z., Dunker A.K. Identifying disordered regions in proteins from amino acid sequence. Int. Conf. Neural Networks. 1997;1:90–95. [Google Scholar]
- 7.Campen A., Williams R.M., Dunker A.K. TOP-IDP-Scale: a new amino acid scale measuring propensity for intrinsic disorder. Protein Pept. Lett. 2008;15:956–963. doi: 10.2174/092986608785849164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Uversky V.N. A protein-chameleon: conformational plasticity of α-synuclein, a disordered protein involved in neurodegenerative disorders. J. Biomol. Struct. Dyn. 2003;21:211–234. doi: 10.1080/07391102.2003.10506918. [DOI] [PubMed] [Google Scholar]
- 9.Thalhammer A., Hundertmark M., Hincha D.K. Interaction of two intrinsically disordered plant stress proteins (COR15A and COR15B) with lipid membranes in the dry state. Biochem. Biophys. Acta. 2010;1798:1812–1820. doi: 10.1016/j.bbamem.2010.05.015. [DOI] [PubMed] [Google Scholar]
- 10.Iakoucheva L.M., Kimzey A.L., Ackerman E.J. Aberrant mobility phenomena of the DNA repair protein XPA. Protein Sci. 2001;10:1353–1362. doi: 10.1110/ps.ps.40101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.DeMarco R., Mathieson W., Wilson R.A. Protein variation in blood-dwelling Schistosome worms generated by differential splicing of micro-exon gene transcripts. Gen. Res. 2010;20:1112–1121. doi: 10.1101/gr.100099.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Feng Z.P., Zhang X., Norton R.S. Abundance of intrinsically unstructured proteins in P. falciparum and other apicomplexan parasite proteomes. Mol. Biochem. Parasitol. 2006;150:256–267. doi: 10.1016/j.molbiopara.2006.08.011. [DOI] [PubMed] [Google Scholar]
