In this Perspective, McKnight discusses the evolution of proteins and the biological relevance of proteins with low-complexity domains (LCDs). McKnight delves into the biophysical and biochemical properties of LCDs that underlie their function and make them challenging to study, further highlighting the outstanding questions currently plaguing the field.
Keywords: low-complexity sequences, intrinsically disordered sequences, phase separation
Abstract
This perspective begins with a speculative consideration of the properties of the earliest proteins to appear during evolution. What did these primitive proteins look like, and how were they of benefit to early forms of life? I proceed to hypothesize that primitive proteins have been preserved through evolution and now serve diverse functions important to the dynamics of cell morphology and biological regulation. The primitive nature of these modern proteins is easy to spot. They are composed of a limited subset of the 20 amino acids used by traditionally evolved proteins and thus are of low sequence complexity. This chemical simplicity limits protein domains of low sequence complexity to forming only a crude and labile type of protein structure currently hidden from the computational powers of machine learning. I conclude by hypothesizing that this structural weakness represents the underlying virtue of proteins that, at least for the moment, constitute the dark matter of the proteome.
Late to the game
Walter Gilbert was early to recognize that life may have evolved from RNA polymers that fold into catalytic enzymes, not initially from protein polymers (Gilbert 1986). This role reversal was prompted by Tom Cech's astounding discovery that folded RNA molecules can function as enzymes (Kruger et al. 1982). Along this path, Harry Noller next discovered that ribosomes are RNA enzymes, not protein machines (Noller et al. 1992). Proof that Noller was spot-on was consummated by high-resolution structural images of ribosomes contributed by Ramakrishnan, Steitz, and Moore (Ban et al. 2000; Wimberly et al. 2000).
As a student of the 1960s and 1970s, Gilbert's concept of an RNA world astounded me. Yes, RNA may have been the sexiest part of the central dogma, yet our thousands of sophisticated proteins must have done the heavy lifting to build life. Now seeing that the catalytic, peptidyl transferase center of the ribosome is composed exclusively of RNA, hindsight is 20:20.
According to the RNA world concept of evolution, proteins did not begin to advance the complexity of life until relatively late in the game. Harry Noller has thought deeply as to how protein synthesis may have first evolved from an RNA world (Noller 2012). In considering the earliest of proteins, I began with several boundaries. First, I restricted my thinking to the earliest proteins that were genetically encoded and assembled via an RNA template decoded by a ribosome-like enzyme. I was not, that is, thinking of incidentally polymerized macromolecules such as polyamines. Second, I restricted my thinking to the first proteins that used the chemical architecture found as the backbone of modern proteins. The peptide bond is unusual in its chemical strength, rigidity, and unique capacity to convey secondary structure to protein polymers.
The first, genetically encoded proteins were almost certainly homopolymeric or the simplest of heteropolymers. Assuming these primordial proteins to be of use to RNA, the amino acid side chain of choice may have been positively charged. Extant proteins biased in these directions are histone tails and the disordered extensions attached to the globular folds of many ribosomal proteins. Both of these protein types are inherently disordered and enriched in lysine and arginine residues that facilitate favorable ionic interaction with the negatively charged phosphodiester backbones of DNA and RNA.
In helping neutralize the negative charge of nucleic acids, primitive cationic proteins may have facilitated some form of condensation of the mesobiotic genetic apparatus responsible for their synthesis. Once the forces of genetic selection began to be imposed on even the simplest of heteropolymers, it was easy to conceptualize how they could evolve in a manner optimizing the catalytic properties of the RNA enzymes from which life was invented.
Low-complexity domains—dark matter of the proteome
The complete genomic sequences of every organism on the planet will soon be known. This watershed compendium of data, coupled with 60 years of precision science emanating from the field of protein structure determination, now allows even the thorniest of biological questions access to a computational foundation. The machine learning capabilities of AlphaFold allow scientists to instantaneously know, at atomic resolution, the three-dimensional shape of almost any protein they might be working with (Jumper et al. 2021).
Whereas the future of biological discovery may belong to the computer, I focus this perspective on a problem that continues to require test tube experimentation. Upward of 20% of modern proteins cannot be tamed by AlphaFold. These AlphaFold evaders are easy to spot. Instead of using all of life's 20-amino-acid toolkit, they are composed of only a limited number of amino acids; they are of low sequence complexity. I refer to this 20% of modern proteins as the dark matter of the proteome.
Protein domains of low sequence complexity have been ugly headaches ever since their discovery three to four decades ago as the activation domains of the Gal4 and VP16 transcription factors (Ma and Ptashne 1987; Triezenberg et al. 1988). These oddball proteins exhibit simple, gibberish-like sequences. They are sensitive to proteolysis. They confer troublesome, aggregation-prone behavior on overexpression. Finally, with few exceptions, the mechanistic basis as to how low-complexity domains (LCDs) function remains a mystery.
Although initially discovered as ancillary components of the transcriptional apparatus, it is now clear that eukaryotic cells contain thousands of LCDs. They fill the permeability channel of nuclear pores, adorn both ends of all 75 of our intermediate filament proteins, and are ubiquitous parts of almost all RNA binding proteins, transcription factors, and regulators of chromatin structure. They are found on integral membrane proteins associated with the plasma membrane, nuclear envelope, and all intracellular organelles. LCDs assist in almost all aspects of cell biology.
How do low-complexity domains work?
Independent studies by Dirk Gorlich on the phenylalanine:glycine (FG) repeat domains of nucleoporin proteins and by my own group on the tyrosine:glycine (YG) repeats of the fused-in-sarcoma (FUS) RNA binding protein led to the same discovery. Incubation of purified samples of either protein yielded phase-separated condensates (Frey et al. 2006; Frey and Görlich 2007; Han et al. 2012; Kato et al. 2012). In both cases, it was concluded that protein self-association was at the heart of phase separation. By self-association, I mean nothing more than a mundane form of homotypic protein:protein interaction labile to disassembly yet simultaneously endowed with the virtue of chemical specificity.
Both of these early studies of LCDs offered two concordant observations in support of this simple generalization. Gorlich found that mutational alteration of the phenylalanine residues of nucleoporin proteins concomitantly impeded both phase separation in vitro and integrity of the nucleopore filtration barrier in vivo (Frey et al. 2006). The McKnight studies of the FUS RNA binding protein revealed the importance of tyrosine residues for both phase separation of the purified LCD in vitro and its ability to partition into nonmembranous, RNA-enriched granules in living cells (Kato et al. 2012). Thus, from the outset, both the Gorlich and McKnight laboratories experimentally verified the importance of aromatic amino acids for LCD self-association, phase separation, and biological function. The Gorlich and McKnight studies likewise found phase-separated samples of the nucleoporin and FUS LCDs to be prominently enriched in β-strand secondary structure (Ader et al. 2010; Kato et al. 2012).
These early reports on the behavior of the nucleoporin and FUS LCDs have been followed by thousands of studies over the past decade; phase separation is the rage. What are the take-home lessons learned from these studies? How do protein domains of low sequence complexity work? Two entirely different concepts have emerged from studies of LCD phase separation. On the one hand, biochemical, structural biological, and human genetic studies focused on LCDs have given evidence that they function by forming weakly folded molecular structures. These structures are chemically annealed via hydrogen bond networks constituting the junction of paired polypeptide backbones (Kato et al. 2022; Zhou et al. 2022, 2023).
In contrast, a more imaginative and widely accepted concept proposes that LCDs self-associate and phase-separate in the complete absence of any form of protein structure (Alberti and Hyman 2021; Lee et al. 2022). This theoretical view is reliant on computational, Monte Carlo simulations of polymer behavior that propose a universal “grammar” representing an entirely new concept of protein function (Wang et al. 2018). The grammar dictating LCD pairing is proposed to be encoded solely by amino acid side chains. The polypeptide backbone is inert according to this second concept of LCD-mediated phase separation, playing no role other than tethering together strings of amino acids.
These incongruous concepts for LCD-mediated phase separation are diametrically opposed as to how protein chemistry is envisioned to work. One concept posits that the polypeptide backbone is primarily responsible for the energetics favoring self-association. The other model discounts any role for the repetitive carbonyl oxygen and peptide nitrogen groups of the polypeptide backbone. In an effort to probe the disconnect between these divergent concepts from a new perspective, I have chosen to focus the remainder of this perspective on pharmacological studies of LCDs that have yet to capture wide recognition.
Pharmacological insight into LCD function is reliant on four indisputable facts. First, the purified forms of many different LCDs self-associate and phase-separate in test tube assays. Second, LCD self-association has recurrently been traced to the formation of intracellular structures that impart dynamic morphological order within cells. Included among these structures are a wide variety of nuclear and cytoplasmic puncta not surrounded by investing membrane. Third, these LCD-enriched intracellular structures are universally sensitive to dissolution following brief exposure of cells to an aliphatic alcohol with the formula of 1,6-hexanediol yet considerably less sensitive to the effects of a regioisomeric chemical bearing the same formula (2,5-hexanediol). Fourth, as assayed in test tube reactions, phase-separated LCDs are likewise dissolved more readily by 1,6-hexanediol than by 2,5-hexanediol. By weaving together these four observations, I hope to arrive at a simple description of how LCDs actually work.
Figure 1 shows images of cultured cells expressing a GFP-tagged form of the vimentin intermediate filament (IF) protein. Within minutes after exposure to 1,6-hexanediol (1,6-HD), vimentin filaments dissolve (Fig. 1, top panels). In contrast, as shown in the bottom panels of Figure 1, little or no filament disassembly is observed upon administration of 2,5-hexanediol (2,5-HD). Supplemental Movies S1 and S2 show the live-cell images from which Figure 1 was obtained. The biased sensitivity of filament melting by 1,6-HD, relative to 2,5-HD, has been observed both in vivo and in vitro for a number of different IF proteins (Lin et al. 2016). It is likewise the case that numerous cellular assemblies, including a variety of membraneless organelles within the nuclei and cytoplasm of eukaryotic cells, are differentially sensitive to melting by 1,6-HD relative to the 2,5-HD regioisomer bearing the same chemical formula. The N-terminal 100 residues of the vimentin IF protein constitute an LCD essential for filament assembly. Removal of the N-terminal LCD of vimentin prevents filament assembly. This same behavior has been observed for many of the 75 intermediate filament proteins found in human cells and tissues. In this regard, the N-terminal LCDs of IF proteins are of unique experimental value. These IF “head domains” have a defined functional activity readily measurable in biochemical assays of IF assembly.
Figure 1.

Related aliphatic alcohols differentially melt vimentin intermediate filaments in living cells. Cultured mammalian cells expressing a GFP:vimentin fusion protein (Gan et al. 2016) were exposed for 5 min to 1,6-hexanediol (top images) or 2,5-hexanediol (bottom images). 1,6-hexanediol was observed to disassemble the intracellular network of vimentin intermediate filaments more effectively than 2,5-hexanediol (Zhou et al. 2023). (Reprinted with from Zhou et al. 2023, © 2023, with permission from Elsevier.)
When studied in isolation, IF head domains become phase-separated in a manner indistinguishable from the FUS and nucleoporin LCDs responsible for launching the field. As shown in Figure 2, condensates formed from the head domains of five different IF proteins are melted more readily by 1,6-HD than by 2,5- HD. The concordance of the in vivo and in vitro activities of these related aliphatic alcohols has evolved as a pharmacological mantra to the phase separation field. Were we able to mechanistically understand how 1,6-HD and 2,5-HD differentially affect LCD self-association and phase separation, this nugget of scientific truth might outpunch its weight in helping demystify the dark matter of the proteome.
Figure 2.
Related aliphatic alcohols differentially melt phase-separated head domains of five intermediate filament proteins. The isolated head domains of the neurofilament heavy, medium, and light isoforms, as well as head domains of the peripherin and vimentin intermediate filament proteins, were incubated under conditions enabling phase separation (Lin et al. 2016). (Adapted from Lin et al. 2016, © 2016, with permission from Elsevier.) Samples were exposed to equivalent concentrations of either 1,6-hexanediol (filled black circles) or 2,5-hexanediol (open squares) for the indicated times. In all cases, 1,6-hexanediol was more effective in dissolving phase-separated intermediate filament head domains than 2,5-hexanediol.
Aliphatic alcohols—the Rosetta Stone connecting live-cell observations to test tube biochemistry
Two recent studies used similar methods of solution NMR to probe the molecular interaction between aliphatic alcohols and purified LCDs. Fawzi and colleagues (Perdikari et al. 2022) investigated the interaction of aliphatic alcohols with the FUS LCD. 1,6-HD was observed to cause pronounced chemical shift perturbations of all carbonyl groups of the polypeptide backbone. In contrast, 1,6-HD prompted only modest shift perturbations to amino acid side chains. The likely chemical basis for backbone interactions was proposed to be hydrogen bonding via the terminal OH groups of 1,6-HD (as hydrogen bond donors) and backbone carbonyl oxygen atoms of the FUS LCD (as hydrogen bond acceptors).
Having observed minimal evidence of interaction between 1,6-HD and amino acid side chains of the FUS LCD, the Fawzi study (Perdikari et al. 2022) was inconsistent with the prevalent belief that aliphatic alcohols work by impeding the adhesive activity of hydrophobic or aromatic amino acid side chains thought to mediate LCD phase separation. Fawzi's solution NMR studies instead highlight the importance of 1,6-HD binding to the polypeptide backbone, particularly interactions between the alcohol and carbonyl oxygen atoms of the backbone. In wrapping up the results section of their report, Fawzi and colleagues conclude with the statement: “Taken together, these data suggest that condensate-modifying molecules like hexanediol might enhance the solvation of the protein backbone, leading to dissolution of the condensates” (Perdikari et al. 2022). It is hard to think of a more straightforward and well-put conclusion.
With extensive help from Jose Rizo-Rey, my trainees and I recently completed a study similar to the aforementioned work from the Fawzi laboratory (Perdikari et al. 2022). In our case, we used solution NMR assays to probe interaction between aliphatic alcohols and the LCD of the TDP-43 RNA binding protein (Gu et al. 2023). Our observations closely match those of Fawzi in showing pronounced chemical shift perturbations indicative of aliphatic alcohol binding to carbonyl oxygen atoms of the polypeptide backbone. Like Fawzi, we attribute the observed chemical shift perturbations to hydrogen bonding between OH groups of the aliphatic alcohols (as hydrogen bond donors) and carbonyl oxygen atoms along the polypeptide backbone (as hydrogen bond acceptors).
We additionally sought to observe differences in chemical shifts effected on the TDP-43 LCD by 1,6-HD as compared with 2,5-HD. Our interest in this point owes to the fact that the former chemical dissolves phase-separated LCDs more effectively than the latter both in vivo and in vitro. Were we to observe differences in perturbation of the TDP-43 NMR spectrum by the two aliphatic alcohols, we might gain useful mechanistic insight as to why 1,6-HD is more effective than 2,5-HD in melting dynamic cellular structures and dissolving phase-separated droplets formed from purified LCDs.
As shown in Figure 3, the two aliphatic alcohols caused similar chemical shift perturbations throughout most of the 151 residues specifying the TDP-43 LCD. Paradoxically, however, 1,6-HD caused stronger perturbations to backbone carbonyl groups than 2,5-HD across a contiguous region between residues 320 and 340 of the TDP-43 LCD. All such differences corresponded to chemical shift perturbations of carbonyl groups of the protein backbone.
Figure 3.
Solution NMR measurements of protein binding by two aliphatic alcohols. A protein fragment corresponding to the low-complexity domain of TDP-43, encompassing residues 263–414, was labeled with 13C and 15N, purified, and incubated with or without 10% levels of either 1,6-hexanediol or 2,5-hexanediol (Gu et al. 2023). (Reprinted from Gu et al. 2023.) The image shows chemical shift perturbation of peptide backbone carbonyl carbon atoms in response to 1,6-hexanediol (red) or 2,5-hexanediol (blue). Both aliphatic alcohols caused equivalent chemical shift perturbations to most carbonyl carbon atoms, as designated by purple, representing both red and blue shifts. 1,6-hexanediol caused substantially greater chemical shift perturbations than 2,5-hexanediol to all carbonyl oxygen groups within the region located between residues 320 and 340. This is the most evolutionarily conserved region of the TDP-43 LCD and coincides with the region known to form labile cross-β interactions, allowing for phase separation (Lin et al. 2020; Zhou et al. 2022).
Two features of this small, central region of the TDP-43 LCD are notable. First, these 20 amino acids are the most evolutionarily conserved of the entire LCD; not a single residue has changed within this region in the 400 million years of evolutionary divergence between fish and humans (Lin et al. 2020). Second, these 20 amino acids correspond to the region experimentally mapped to be essential for phase separation. The ultraconserved region located between residues 320 and 340 of the TDP-43 LCD specifies formation of a labile cross-β structure weakly annealed by a network of hydrogen bonds formed from peptide nitrogen NH groups of one strand pairing with carbonyl oxygen atoms of the sister strand (Zhou et al. 2022).
The solution NMR experiments exemplified by Figure 3 yield a simple conclusion. The backbone carbonyl groups differentially perturbed by the two aliphatic alcohols are one and the same as those required to hydrogen-bond with peptide NH groups to form the labile cross-β structure mediating self-association. As such, we can now understand why 1,6-HD dissolves phase-separated samples of the TDP-43 LCD more effectively than 2,5-HD. If carbonyl oxygen atoms within the ultraconserved region of the TDP-43 LCD are sequestered by 1,6-HD, it follows that they cannot simultaneously become hydrogen-bonded to the backbone NH groups of another polypeptide strand so as to enable self-association and phase separation.
Synthesis of a simple idea
A clear picture of LCD function has now emerged. Aliphatic alcohols melt phase-separated condensates by sequestering carbonyl oxygen atoms of the polypeptide backbone. The dissociative activity of aliphatic alcohols observed in living cells matches their activity observed in test tube assays. By elucidating how aliphatic alcohols work in a chemical sense, we are now able to understand the interactions most essential for LCD self-association and phase separation both in vitro and in vivo.
The self-association and consequent phase separation of LCDs is specified by short networks of hydrogen bonds formed along the polypeptide backbone between peptide NH groups of one chain hydrogen bonding to carbonyl oxygen atoms of the other chain. There is little new to this discovery; almost nothing we have observed diverges from what was taught to us seven decades ago by perhaps the greatest chemist of the 20th century: Linus Pauling (Pauling and Corey 1951). As shown schematically in Figure 4, short networks of hydrogen bonds formed along the polypeptide backbones of annealed LCDs represent the adhesive energy required for self-association and phase separation.
Figure 4.
Schematic diagram of hydrogen bond networks responsible for labile, cross-β annealing of polypeptide backbones. Short regions of backbone pairing are shown to anneal two polypeptide regions of low sequence complexity in a cross-β conformation as described by Linus Pauling over seven decades ago (Pauling and Corey 1951). (Left) Hydrogen bond network of a protein pair annealed in a parallel orientation. (Right) Hydrogen-bonding pattern of strands annealed in an antiparallel orientation.
The implications of this simple description of LCD self-association may be important to our perception of protein aggregation and neurodegenerative disease. Knowing that small regions of otherwise disordered LCDs self-associate in a labile cross-β conformation to perform their biological function, what prevents these labile cross-β structures from polymerizing into pathological aggregates? How do disease-causing mutations in or around these labile structures tip the polymerization balance toward pathological aggregation? How do various forms of post-translational modification affect this balance so as to impact either biology or pathology?
In returning to the proteins first appearing at the dawn of evolution, we can add a potentially provocative idea. Aside from using cationic side chains to charge-neutralize and condense the RNA machines from which they were synthesized, it is possible that primitive proteins used the chemistry of their polypeptide backbones to self-associate and phase-separate just as we are now observing for their modern counterparts. In other words, RNA-enriched protein condensates may go back to the dawn of evolution.
Why has this crude form of protein structure been preserved through evolution? I offer that protein structures poised at the threshold of thermodynamic equilibrium are of optimal design for regulation by post-translational modification. Although constituting only 10%–20% of the proteome, LCDs house 75% of all forms of post-translational modification; this is where the rubber hits the road for protein regulation (Woodsmith et al. 2013; Pejaver et al. 2014). Biology thrives, I submit, by the use of dynamic structures built on design principles accentuating the virtue of weakness.
Five loose ends
I close this perspective by considering five outstanding questions. First, why is it that only small regions of 10–30 residues are chosen to facilitate the homotypic annealing of LC domains? Many LCDs are hundreds of residues in length. Why are they so much larger than the transient structures they use to impart biological function? One possibility is that, in the unliganded state, LCDs collapse into monomeric blobs that mask their cross-β-forming region. In other words, when not in use, these potentially dangerous, cross-β-prone regions may be hidden away within loosely condensed monomers. This concept may account for the importance of phenylalanine and tyrosine residues liberally distributed throughout phase separation-competent LCDs. I speculate that these aromatic side chains may enable weakly adhesive intramolecular interactions useful for monomer condensation (Figure 5).
Figure 5.

Speculative concept of a low-complexity domain as a collapsed monomer masking its cross-β-forming region. In the unliganded, monomeric state, protein domains of low sequence complexity may collapse via weakly adhesive intrastrand π:π stacking of aromatic amino acid side chains. The collapsed state may hide/mask the polypeptide region of a low-complexity domain responsible for forming self-associative cross-β interactions.
Second, how are cross-β-forming regions selectively specified for homotypic annealing? At one level, the answer to this question is simple. The amino acid sequences of LCDs, degenerate as they might be, must specify the regions that have evolved to self-associate. At present, we can only find these regions by one of two ways: either by experimentation or by mapping the locations of disease-causing mutations (Murray et al. 2018; Zhou et al. 2022). It is possible that the small regions allowing for labile self-association are biased toward evolutionary conservation, as is the case for the cross-β-forming region located between residues 320 and 340 of the TDP-43 LCD. If so, computational methods may even now be of help to the field by pinpointing the small regions required for in-register, cross-β pairing.
It is inevitable that expedient methods will evolve to allow generation of data sets that identify large numbers of self-association-competent regions within LCDs. The grist of large amounts of accurate data will feed the computational scientists sure to put experimentalists like me out of business. As a challenge to big data scientists, I point out that each of our 75 intermediate filament (IF) proteins contains a region allowing for labile, homotypic, cross-β self-association. These functionally essential regions are universally localized within the N-terminal “head domains” of IF proteins, and these head domains are uniform in size (∼100 amino acids). Inventive scientists are sure to resolve how IF head domains self-associate by use of more facile methods than the tedious forms of experimentation we have deployed over the past 7–8 years (Lin et al. 2016; Sysoev et al. 2020; Zhou et al. 2021, 2023).
My third loose end goes back to the Rosetta Stone of aliphatic alcohols. How is it that 1,6-HD is more effective in melting labile cross-β structures than 2,5-HD? The two molecules share the same chemical formula and are equally hydrophobic. Why, then, is one regioisomer more active than the other? The primary alcohol groups of 1,6-HD are attached to either end of the aliphatic carbon chain. The secondary alcohols of 2,5-HD are separated by a shorter distance. I speculate that these varied geometries might, by the slightest of margins, better qualify 1,6-HD to simultaneously form hydrogen bonds with two consecutive carbonyl oxygen atoms of the polypeptide backbone. My trainees and I have postulated this idea elsewhere, showing that the distance between alcohol groups on 1,6-HD may allow concomitant hydrogen bonding to two neighboring carbonyl oxygen atoms along the polypeptide backbone (Kato et al. 2022). If so, simple chemical geometry may explain why 1,6-HD melts labile cross-β structures more effectively than 2,5-HD. A similar vein of thinking may explain the efficacy of 1,2-HD in compromising the permeability barrier of nucleopores (Ribbeck and Görlich 2002) and melting cellular assemblies enriched in LCDs. Carbonyl oxygen atoms of the polypeptide backbone contain two hydrogen bond-accepting π orbitals. As such, the geometric shape of 1,2-HD may account for its avidity to carbonyl oxygen atoms of the polypeptide backbone, as observed by solution NMR spectroscopy (Perdikari et al. 2022).
As a fourth question, I ask how protein domains of low sequence complexity achieve a sufficiently high concentration in living cells to self-associate. Test tube reactions require that purified LCDs be incubated at concentrations of ≥10–100 µM to allow for phase separation. I hypothesize that these concentrations may be achieved in living cells via LCD attachment to conventionally folded protein domains. In innumerable instances, we already understand how conventionally folded domains nucleate the formation of ordered assemblies. The head domains of intermediate filament proteins, for example, are brought into close proximity and high local concentrations via the pairing of long α helices that assemble in a coiled-coil conformation. The FG domains filling the central channel of nuclear pores coalesce at a high concentration owing to their attachment to conventionally structured domains forming the annulus of nuclear pores. The LCDs associated with hnRNPs are forced to high local concentration via the oligomeric coating of long RNA polymers enabled by structured KH, RRM, or pumilio domains. In other words, I offer that the weak interactions formed by LCD self-association are enabled by the more avid and highly specific forms of macromolecular interaction systematically elucidated by biochemists and structural biologists over the past six decades. By constituting weak and transient forms of “icing on the cake” to cellular assemblies, I identify any nexus formed by transiently paired LCDs as an opportune locus for biological regulation.
My final loose end admits that this perspective is heavily biased with attention to homotypic LCD interactions. It is undoubtedly the case that LCDs coalesce heterotypically. How does this work? How, for example, does the C-terminal domain (CTD) of RNA polymerase II, which represents a cardinal example of an LCD, interact in a specific manner with condensates formed by the LCDs of both FUS and TAF-15 (Kwon et al. 2013)? Once LCDs become homotypically paired to establish transient structural order, might it be that the newly formed surfaces of these structures allow for unique interaction with other proteins? Whereas we know that CTD binding to phase-separated samples of FUS and TAF-15 is stringently reversed by cyclin-dependent kinase-mediated phosphorylation of the CTD (Kwon et al. 2013), we are ignorant of the molecular shape of this heteromeric, CTD:LCD interface. This ignorance prevents us from understanding, in a chemical sense, how the presence or absence of phosphate groups on the heptad repeats of the CTD can flip a molecular switch sitting at the heart of the eukaryotic transcriptional apparatus. By concluding with this dilemma, I hope to emphasize that studies of the dark matter of the proteome are actually in their infancy.
Closing thought
The proteomes of eukaryotic organisms contain thousands of LCDs. Many of these LCDs constantly move in and out of a structurally ordered state. In other words, thousands of proteins within our cells oscillate in and out of molecular structures poised at the threshold of thermodynamic equilibrium. This closing thought favors a more dynamic perspective on cell biology than I learned as a student.
Supplementary Material
Acknowledgments
I thank Xiaoming Zhou, Jinge Gu, Masato Kato, Glen Liszczak, Deepak Nijhawan, Ueli Schibler, Rich Losick, Art Horwich, and Mark Ptashne for reading this manuscript and offering useful criticisms. The McKnight Laboratory at University of Texas Southwestern Medical Center has long been supported by funds from both the National Institutes of Health and an anonymous donor.
Footnotes
Supplemental material is available for this article.
Article published online ahead of print. Article and publication date are online at http://www.genesdev.org/cgi/doi/10.1101/gad.351465.123.
Competing interest statement
The author declares no competing interests.
References
- Ader C, Frey S, Maas W, Schmidt HB, Görlich D, Baldus M. 2010. Amyloid-like interactions within nucleoporin FG hydrogels. Proc Natl Acad Sci 107: 6281–6285. 10.1073/pnas.0910163107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alberti S, Hyman AA. 2021. Biomolecular condensates at the nexus of cellular stress, protein aggregation disease and ageing. Nat Rev Mol Cell Biol 22: 196–213. 10.1038/s41580-020-00326-6 [DOI] [PubMed] [Google Scholar]
- Ban N, Nissen P, Hansen J, Moore PB, Steitz TA. 2000. The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science 289: 905–920. 10.1126/science.289.5481.905 [DOI] [PubMed] [Google Scholar]
- Frey S, Görlich D. 2007. A saturated FG-repeat hydrogel can reproduce the permeability properties of nuclear pore complexes. Cell 130: 512–523. 10.1016/j.cell.2007.06.024 [DOI] [PubMed] [Google Scholar]
- Frey S, Richter RP, Görlich D. 2006. FG-rich repeats of nuclear pore proteins form a three-dimensional meshwork with hydrogel-like properties. Science 314: 815–817. 10.1126/science.1132516 [DOI] [PubMed] [Google Scholar]
- Gan Z, Ding L, Burckhardt CJ, Lowery J, Zaritsky A, Sitterley K, Mota A, Costigliola N, Starker CG, Voytas DF, et al. 2016. Vimentin intermediate filaments template microtubule networks to enhance persistence in cell polarity and directed migration. Cell Syst 3: 252–263.e8. 10.1016/j.cels.2016.08.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilbert W. 1986. Origin of life: the RNA world. Nature 319: 618–618. 10.1038/319618a0 [DOI] [Google Scholar]
- Gu J, Zhou X, Sutherland L, Kato M, Jaczynska K, Rizo J, McKnight SL. 2023. Oxidative regulation of TDP-43 self-association by a β-to-α conformational switch. Proc Natl Acad Sci 120: e2311416120. 10.1073/pnas.2311416120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han TW, Kato M, Xie S, Wu LC, Mirzaei H, Pei J, Chen M, Xie Y, Allen J, Xiao G, et al. 2012. Cell-free formation of RNA granules: bound RNAs identify features and components of cellular assemblies. Cell 149: 768–779. 10.1016/j.cell.2012.04.016 [DOI] [PubMed] [Google Scholar]
- Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596: 583–589. 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kato M, Han TW, Xie S, Shi K, Du X, Wu LC, Mirzaei H, Goldsmith EJ, Longgood J, Pei J, et al. 2012. Cell-free formation of RNA granules: low complexity sequence domains form dynamic fibers within hydrogels. Cell 149: 753–767. 10.1016/j.cell.2012.04.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kato M, Zhou X, McKnight SL. 2022. How do protein domains of low sequence complexity work? RNA 28: 3–15. 10.1261/rna.078990.121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kruger K, Grabowski PJ, Zaug AJ, Sands J, Gottschling DE, Cech TR. 1982. Self-splicing RNA: autoexcision and autocyclization of the ribosomal RNA intervening sequence of Tetrahymena. Cell 31: 147–157. 10.1016/0092-8674(82)90414-7 [DOI] [PubMed] [Google Scholar]
- Kwon I, Kato M, Xiang S, Wu L, Theodoropoulos P, Mirzaei H, Han T, Xie S, Corden Jeffry L, McKnight Steven L. 2013. Phosphorylation-regulated binding of RNA polymerase II to fibrous polymers of low-complexity domains. Cell 155: 1049–1060. 10.1016/j.cell.2013.10.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee DSW, Strom AR, Brangwynne CP. 2022. The mechanobiology of nuclear phase separation. APL Bioeng 6: 021503. 10.1063/5.0083286 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin Y, Mori E, Kato M, Xiang S, Wu L, Kwon I, McKnight SL. 2016. Toxic PR poly-dipeptides encoded by the C9orf72 repeat expansion target LC domain polymers. Cell 167: 789–802.e12. 10.1016/j.cell.2016.10.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin Y, Zhou X, Kato M, Liu D, Ghaemmaghami S, Tu BP, McKnight SL. 2020. Redox-mediated regulation of an evolutionarily conserved cross-β structure formed by the TDP43 low complexity domain. Proc Natl Acad Sci 117: 28727–28734. 10.1073/pnas.2012216117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma J, Ptashne M. 1987. Deletion analysis of GAL4 defines two transcriptional activating segments. Cell 48: 847–853. 10.1016/0092-8674(87)90081-X [DOI] [PubMed] [Google Scholar]
- Murray DT, Zhou X, Kato M, Xiang S, Tycko R, McKnight SL. 2018. Structural characterization of the D290V mutation site in hnRNPA2 low-complexity-domain polymers. Proc Natl Acad Sci 115: E9782–E9791. 10.1073/pnas.1806174115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noller HF. 2012. Evolution of protein synthesis from an RNA world. Cold Spring Harb Perspect Biol 4: a003681. 10.1101/cshperspect.a003681 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noller HF, Hoffarth V, Zimniak L. 1992. Unusual resistance of peptidyl transferase to protein extraction procedures. Science 256: 1416–1419. 10.1126/science.1604315 [DOI] [PubMed] [Google Scholar]
- Pauling L, Corey RB. 1951. The pleated sheet, a new layer configuration of polypeptide chains. Proc Natl Acad Sci 37: 251–256. 10.1073/pnas.37.5.251 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pejaver V, Hsu WL, Xin F, Dunker AK, Uversky VN, Radivojac P. 2014. The structural and functional signatures of proteins that undergo multiple events of post-translational modification. Protein Sci 23: 1077–1093. 10.1002/pro.2494 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perdikari TM, Murthy AC, Fawzi NL. 2022. Molecular insights into the effect of alkanediols on FUS liquid-liquid phase separation. bioRxiv 10.1101/2022.05.05.490812 [DOI] [Google Scholar]
- Ribbeck K, Görlich D. 2002. The permeability barrier of nuclear pore complexes appears to operate via hydrophobic exclusion. EMBO J 21: 2664–2671. 10.1093/emboj/21.11.2664 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sysoev VO, Kato M, Sutherland L, Hu R, McKnight SL, Murray DT. 2020. Dynamic structural order of a low-complexity domain facilitates assembly of intermediate filaments. Proc Natl Acad Sci 117: 23510–23518. 10.1073/pnas.2010000117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Triezenberg SJ, Kingsbury RC, McKnight SL. 1988. Functional dissection of VP16, the trans-activator of herpes simplex virus immediate early gene expression. Genes Dev 2: 718–729. 10.1101/gad.2.6.718 [DOI] [PubMed] [Google Scholar]
- Wang J, Choi J-M, Holehouse AS, Lee HO, Zhang X, Jahnel M, Maharana S, Lemaitre R, Pozniakovsky A, Drechsel D, et al. 2018. A molecular grammar governing the driving forces for phase separation of prion-like RNA binding proteins. Cell 174: 688–699.e16. 10.1016/j.cell.2018.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wimberly BT, Brodersen DE, Clemons WM, Morgan-Warren RJ, Carter AP, Vonrhein C, Hartsch T, Ramakrishnan V. 2000. Structure of the 30S ribosomal subunit. Nature 407: 327–339. 10.1038/35030006 [DOI] [PubMed] [Google Scholar]
- Woodsmith J, Kamburov A, Stelzl U. 2013. Dual coordination of post translational modifications in human protein networks. PLoS Comput Biol 9: e1002933. 10.1371/journal.pcbi.1002933 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou X, Lin Y, Kato M, Mori E, Liszczak G, Sutherland L, Sysoev VO, Murray DT, Tycko R, McKnight SL. 2021. Transiently structured head domains control intermediate filament assembly. Proc Natl Acad Sci 118: e2022121118. 10.1073/pnas.2022121118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou X, Sumrow L, Tashiro K, Sutherland L, Liu D, Qin T, Kato M, Liszczak G, McKnight SL. 2022. Mutations linked to neurological disease enhance self-association of low-complexity protein sequences. Science 377: eabn5582. 10.1126/science.abn5582 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou X, Kato M, McKnight SL. 2023. How do disordered head domains assist in the assembly of intermediate filaments? Curr Opin Cell Biol 85: 102262. 10.1016/j.ceb.2023.102262 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



