Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
. 2010 Jun 17;27(11):2628–2641. doi: 10.1093/molbev/msq151

Genetic Organization, Length Conservation, and Evolution of RNA Polymerase II Carboxyl-Terminal Domain

Pengda Liu 1,, John M Kenney 2, John W Stiller 1,*,, Arno L Greenleaf 3,*,
PMCID: PMC2981489  PMID: 20558594

Abstract

With a simple tandem iterated sequence, the carboxyl terminal domain (CTD) of eukaryotic RNA polymerase II (RNAP II) serves as the central coordinator of mRNA synthesis by harmonizing a diversity of sequential interactions with transcription and processing factors. Despite intense research interest, many key questions regarding functional and evolutionary constraints on the CTD remain unanswered; for example, what selects for the canonical heptad sequence, its tandem array across organismal diversity, and constant CTD length within given species and finally and how a sequence-identical, repetitive structure can orchestrate a diversity of simultaneous and sequential, stage-dependent interactions with both modifying enzymes and binding partners? Here we examine comparative sequence evolution of 58 RNAP II CTDs from diverse taxa representing all six major eukaryotic supergroups and employ integrated evolutionary genetic, biochemical, and biophysical analyses of the yeast CTD to further clarify how this repetitive sequence must be organized for optimal RNAP II function. We find that the CTD is composed of indivisible and independent functional units that span diheptapeptides and not only a flexible conformation around each unit but also an elastic overall structure is required. More remarkably, optimal CTD function always is achieved at approximately wild-type CTD length rather than number of functional units, regardless of the characteristics of the sequence present. Our combined observations lead us to advance an updated CTD working model, in which functional, and therefore, evolutionary constraints require a flexible CTD conformation determined by the CTD sequence and tandem register to accommodate the diversity of CTD–protein interactions and a specific CTD length rather than number of functional units to correctly order and organize global CTD–protein interactions. Patterns of conservation of these features across evolutionary diversity have important implications for comparative RNAP II function in eukaryotes and can more clearly direct specific research on CTD function in currently understudied organisms.

Keywords: RNAPII CTD, transcription, functional organization, evolutionary conservation

Introduction

Proper gene regulation underlies cellular differentiation, morphogenesis, and the versatility and adaptability of any organism. Alterations in regulation are important in evolutionary change because control of the timing, location, and level of gene expression can have profound effects on protein function and, therefore, the phenotype and fitness of the organism. Studies over decades have revealed that many steps in gene expression initially viewed as independent are intricately coordinated, interconnected, and regulated in a sophisticated network (for reviews, see Maniatis and Reed [2002] and Proudfoot et al. [2002]). The central coordinator that couples this regulatory network together, along with many other functions carried out in the cell's nucleus, is RNA polymerase II (RNAP II); of particular importance is the stage-specific differentially modified sequence-repetitive carboxyl terminal domain (CTD) of its largest subunit (Rpb1).

Studies in yeast and animals have demonstrated that the CTD serves as a scaffold for the assembly of protein complexes that help coordinate not only mRNA biogenesis, including transcription initiation (Kim et al. 1994), promoter clearance (Corden 1993), histone methylation (Hampsey and Reinberg 2003), capping (Cho et al. 1997; McCracken, Fong, Rosonina, et al. 1997), elongation (Corden and Patturajan 1997), RNA processing (Proudfoot et al. 2002) (editing [Ryman et al. 2007], splicing [de la Mata and Kornblihtt 2006], polyadenylation and 3′ cleavage [McCracken, Fong, Yankulov, et al. 1997]), and termination (Gudipati et al. 2008; Vasiljeva et al. 2008) (and see review [Phatnani and Greenleaf 2006]), but also snRNA gene expression (Jacobs et al. 2004; Egloff et al. 2007; Egloff and Murphy 2008a). These widely varied functions of the CTD are accomplished through diversity in recognition of the step-dependent modification patterns on the CTD repeats, sometimes referred to as the “CTD code” (Buratowski 2003; Corden 2007; Egloff and Murphy 2008b). Although crystal structures of RNAP II have been solved at high resolution, the CTD is an exception; it appears to maintain an unordered structure during the whole transcription process, permitting interactions with different sets of CTD modifying enzymes, and binding factors (Meinhart et al. 2005; Phatnani and Greenleaf 2006).

The CTD comprises tandemly repetitive heptapeptides with the consensus sequence Tyr1-Ser2-Pro3-Thr4-Ser5-Pro6-Ser7 (YSPTSPS) and occurs in all green plants, yeasts, and animals tested to date, as well as in many protists including Ameobozoa, Microsporidia, chytrids, and choanoflagellates (Stiller et al. 2000; Stiller and Hall 2002; Stiller and Cook 2004). Its tandem structure is proposed to have arisen through amplifications of a repetitive DNA sequence (Chapman et al. 2008). The CTD is essential for viability in animals and yeasts, where it has been studied most extensively. The number of repeats present is somewhat proportional to genomic complexity, with multicellular animals and plants typically containing longer CTD regions than yeasts and other simpler forms (fig. 1 and supplementary fig. 1, Supplementary Material online).

FIG. 1.

FIG. 1.

Conservation and degeneration of the CTD across eukaryotic diversity. CTDs from individual taxa are aligned by recognizable heptapeptide motifs or by aligning tyrosine residues where not even degenerate heptapeptides can be identified. Green regions correspond to sequences that conform to the essential CTD functional units indentified in yeast, yellow to conserved heptapeptides that do not have proper adjacent sequences to form a functional unit, and red to regions with substitutions that lethally disrupt CTD function in yeast. Full sequences with color coding are provided in supplementary figure 1 (Supplementary Material online). Taxonomic designations and phylogenetic associations are based on the Tree of Life Web Project (http://www.tolweb.org/tree/), with tentative relationships that remain controversial shown by dashed lines. Within larger taxonomic groups, and between related groups, the pattern of conservation or degeneration tends to be consistent. For example, in both animals and green plants, proximal heptads are arrayed consistently in viable functional units, whereas distal CTD sequences tend toward more degeneracy. In contrast, conserved heptads occur more distally in basidiomycete fungi. In other groups, such as ciliates and parasitic excavate taxa, there appears to be no purifying selection on CTD structure whatsoever. The functional significance this variation, with respect to how RNAP II transcription is organized remains unclear. Although not shown in this figure, within Plasmodium falciparum and related species, there is extensive variation in the number of tandem functional units present, which could reflect coevolution with mammalian host transcription systems (Kishore et al. 2009).

The consensus sequence and tandem structure are evolutionarily conserved from yeast to human but, interestingly, not among all eukaryotes. Initial phylogenetic analyses of widely diverse organisms indicated that the CTD's canonical sequence and repetitive structure was conserved strongly only in a set of taxonomic groups that was named the “CTD clade” (Stiller and Hall 2002), which clustered together in trees constructed from RPB1 sequences. Phylogenomic studies generally indicate that this “CTD clade” is not a natural evolutionary group, and broader sampling of diverse eukaryotes show that the pattern of CTD conservation is more complicated than it first appeared (fig. 1).

In addition to the complete or nearly complete degeneration of the CTD in eukaryotic groups outside the “CTD clade,” certain groups of fungi have lost the well-ordered CTD structure found in yeasts (fig. 1). Moreover, organisms with tandem repeats of various alternative sequences are found, for example, in Mastigamoeba invertens (YSPASPA), Plasmodium falciparum (YSPTSPK), and the red algae Glaucosphaera vacuolata (YSPTSPT/H/Q) and Cyanidioschyzon merolae (YSPSSPVNA).

Although CTDs from different species vary in length and amount of degenerative sequence, some consistent patterns have emerged. The consensus heptapeptide and the tandem structure are conserved across a wide range of organisms and, even when not, the overall pattern of CTD-like motifs remains strongly conserved (fig. 1). Moreover, both length and relative pattern of canonical and degenerate regions are relatively conserved among related taxa (fig. 1). This raises a number of questions that are best addressed through integrated evolutionary and functional analyses. What selects for the specificity of the canonical heptad across broad organismal diversity? Why is this heptad maintained in a global tandem register? Why are both these features conserved strongly in some eukaryotic lineages, whereas one or both are altered dramatically in others? What selection pressures are responsible for maintaining a constant CTD length? Finally, how can a sequence-identical, repetitive structure orchestrate such a diversity of simultaneous and sequential, stage-dependent interactions with both modifying enzymes and binding partners?

In this article, we report novel results from genetics, biochemical, and biophysical analyses in budding yeast, to develop a model of the evolution of sequence identical but functionally segregated CTD essential units and the selective constraints that keep them arranged in a tandemly repeated, overlapping global structure. We propose that the globally unordered protein structure conferred by tandem repeats of the canonical heptapeptide is critical for CTD function, particularly when large numbers of protein interactions must be accommodated and that differences in the CTD's workload through the transcription cycle likely underlie variation across eukaryotic lineages.

Materials and Methods

Construction of Artificial CTD Sequence and Yeast Transformation

Artificial oligonucleotides with various substitutions or insertions were used to construct mutated CTDs, subcloned into a yeast RNAP II shuttle vector and transformed into yeast cells via the plasmid shuffle as described in detail previously (Liu et al. 2008).

Construction and purification of GST–CTD fusion proteins, in vitro phosphorylation assays for GST-tagged CTD mutants and quantification of relative levels of phosphorylation were as described previously (Liu et al. 2008).

Construction and Purification of Tag-Free CTD Mutant Proteins via Intein Tag System

Mutated CTD sequences were polymerase chain reaction amplified from pSBO-CTD constructs and cloned into the Escherichia coli pTXB1 expression vector (New England Biolab: http://www.neb.com/nebecomm/products/productN6707.asp) polylinker region, upstream of the intein/chitin-binding domain from the Mycobacterium xenopi GyrA gene, to generate a series of C-terminal intein-tagged CTD mutants. Two sets of primers were used for sequences from wild type (WT) and mutants, in order to reduce non-CTD vector sequence in the protein products. pTXB1 F WT: 5′-GGT GGT CAT ATG TTT TCT CCA ACT TCC CCA-3′ and pTXB1 R WT: 5′-GGT GGT TGC TCT TCC GCA TGC AGG AGA TCC TGG GCT-3′ were designed for WT, whereas pTXB1 F6: 5′-GTG GAT CAT ATG GGG ATC CTT GGA GTC TCC-3′ and pTXB1 R3: 5′-GTA GCT TGC TCT TCC GGT CAT AGC ATA GGG GTA GCT-3′ were designed for all other CTD mutants. Those two sets of primers introduced SapI and NdeI restriction sites into flanking sequences of WT and artificial CTDs. Transformants were sequenced to verify integrity. Vectors encoding various CTD fusion proteins were transformed into the E. coli strain PR2655 for intein-tagged fusion protein expression.

A fresh single colony carrying the correct mutated CTD sequence in pTXB1 vector was inoculated directly into 1L LB+AMP liquid medium and incubated in an orbital shaker at 37 °C until the optical density (OD)600 reached 0.5, then 0.4 mM isopropyl β-D-1-thiogalactopyranoside was added for induction of protein expression. The culture was incubated for another 4 h, and the cells were spun down at 5,000 revolutions per minute (rpm) at 4 °C; 100 ml of column buffer (20 mM Na-4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES), pH 8.5; 500 mM NaCl, and 1 mM ethylenediaminetetraacetic acid [EDTA]) was added and the pellet was sonicated three times at 50% duty cycle for 45 s each. Cell debris was removed by centrifugation at 25 K relative centrifugal force (rcf) for 30 min at 4 °C. The clarified cell extract was loaded on pre-equilibrated Chitin columns packed with 10 ml chitin beads with column buffer and washed by at least 20 bed volumes of column buffer. On column, cleavage of the intein tag from the intein–CTD fusion protein was conducted by quickly flushing the column with three bed volumes of the cleavage buffer (20 mM Na-HEPES, pH 8.5; 50 mM NaCl; 50 mM Dithiothreitol, and 1 mM EDTA) to evenly distribute thiols through the column. Then the column flow was stopped and the column was left at 4 °C for 12 h for total intein tag cleavage. Target protein without the intein tag was eluted out using the column buffer, and elution efficiency was monitored by OD280 reading. Elution fractions were resolved on 4–20% sodium dodecyl sulfate polyacrylamide gel electrophoresis gels with silver staining. Concentrations of tag-free CTD fusion proteins were determined using an UV/vis photometer at manufacturer's standard settings (Biophotometer, Eppendorf, Hamburg).

Growth Curve Experiments

Three different single colonies for “WT” controls (pY1-transformed yeast) and each CTD mutants were picked from 5-Fluoroorotic acid plates, inoculated into 10 ml liquid YPD medium, and incubated on a shaking incubator at 250 rpm overnight. OD600 readings were monitored for each sample starting around 0.1. For each reading, 50 μl of culture was pipetted out into a 1-mm cuvette, and the reading was repeated twice and averaged. The cell number was calculated based on the assumption that an OD600 of 0.1 equates to 1 × 106 cells. The cell number calculation was confirmed by hemocytometer counting. Measurements were taken until concentrations exceeded and OD600 of 1.0, and all data were input into excel and analyzed by the following formula: doubling time = ln2/((ln (A/Ao))/t), where A = cell density (e.g., OD) at time t and A0 = initial cell density.

Protein Secondary Structure Prediction

Protein secondary structure predictions were performed on Imperial College Phyre Server (http://www.sbg.bio.ic.ac.uk/phyre/) using five different methods. Different CTD mutant protein sequences in FASTA format were input into the prediction window and the results were obtained from the server Web page.

Circular Dichroism Measurements

All circular dichroism (CD) measurements were carried out on a Jasco J-720 CD machine equipped with a temperature controller. CD spectra were the average of eight scans. All CTD spectra were recorded on protein samples in pure water at room temperature. CD parameter settings are as follows: sensitivity, standard (100 mdeg); start, 270 nm; end, 180 nm; data pitch, 0.1 nm; scanning model, continuous; scanning speed, 50 nm/min; response, 4 s; band width, 2 nm; and accumulation: 8.

Results

The CTD Is Composed of Indivisible and Independent Functional Units that Span Diheptapeptides

Because of its tandemly iterated structure, a clear explanation of how the CTD sequence determines CTD function has been elusive. Nevertheless, genetic investigations in both yeast and animals have shown that in a given heptad repeat, Y1, S2P3, and S5P6 are essential for CTD function (West and Corden 1995); in contrast, neither T4 nor S7 is required for viability in yeast (Stiller et al. 2000). S7 phosphorylation has been shown to be critical for small nuclear RNA transcription (Chapman et al. 2007; Egloff et al. 2007) in mammalian cells and recently also was observed in yeast (Akhtar et al. 2009); however, given the viability of complete S7 → A7 Saccharomyces cerevisiae CTD variants, the significance of S7 phosphorylation for RNAP II transcription in yeast is unclear. In addition, a variety of substitution mutants are viable, including those with partial changes at essential positions (e.g., in half of all S2 residues) (West and Corden 1995), demonstrating that a great deal of flexibility is permitted in primary CTD structure. With respect to a globally conserved tandem structure, Ala insertions between adjacent repeats are lethal in yeast in any register, whereas Ala residues inserted between pairs of heptapeptides are well tolerated (Stiller and Cook 2004); this demonstrates that the irreducible CTD functional unit lies within paired heptads. Our genetic analyses of a series of yeast CTD substitution and insertion mutants within or between diheptads have further narrowed this indivisible unit down to two loosely linked sequence elements maintained in paired heptapeptides. Specifically, “S2P-S5P-S9P” must be present and associated with two tyrosines spaced seven amino acids apart in either a “Y1–Y8” (YSPTSPSYSP) or “Y8–Y15” (SPTSPSYSPTSPSY) orientation (Stiller and Cook 2004; Liu et al. 2008). For example, yeast mutants are viable with repeats containing only a minimal sequence of these two essential elements (Y1S2P3T4S5P6S7Y1S2P3T or “252,” named after the positions of essential phosphoserines), with alanines replacing the right-hand S5P6S7 residues (see construct “AR” in table 1).

Table 1.

Sequences and Phenotypes for CTD Mutants of Rpb1.

CTD Mutant Repeated Sequence Cell Phenotype at around WT CTD Length
AL ASPTSPS YSPTSPS Lethal
AR YSPTSPS YSPTAAA Viable ++++
YAA YAATSPS YSPTSPS Lethal
YATA YAPTAPS YSPTSPS Lethal
YAS YASPTSPS YSPTSPS Viable +++
YAP YAPTSPS YSPTSPS Lethal
APS YSPTAPS YSPTSPS Viable ++++
252 YSPTSPS YSPS Viable +++++
AA AA YSPTSPS YSPTSPS Viable ++++
5A AAAAA YSPTSPS YSPTSPS Slow growth +
7A AAAAAAA YSPTSPS YSPTSPS Lethal
A7 AAAAAAA YSPTSPS YSPTSPS YSPTSPS Viable +++
AP AAPAAPA YSPTSPS YSPTSPS Viable ++++

The “+” marks for each viable CTD mutant indicate the relative vigor of the yeast cells bearing the mutants, compared with WT cells (WT is labeled as five pluses).

We constructed a series of CTD variants, containing 6, 9, and 12 repeats, respectively, of the “252” minimum required sequence (YSPTSPSYSPS). Although these CTDs are missing the last three residues (SPS) of every other heptad completely, strains with 9 or more tandem copies of this sequence grew almost like WT (table 1, construct 252). These results show that the minimal (252) sequence, as defined by substitution mutants within a global heptad register, is sufficient to confer virtually WT CTD function, even outside the normal positional spacing of tandemly repeated heptapeptides (in these variants, the repeating unit is an 11-mer). In “252” CTDs, the absence of the last “SP” pair in all second repeats of diheptads eliminates the potential for them to serve as proximal heptads in overlapping units. This constrains the mutated CTD to a string of nonoverlapping individual minimal functional units and eliminates almost half the essential units potentially present for a given CTD length. These features also apply to many other viable CTD mutants we analyzed (table 1, rows AR, APS, and A7). Thus, the overall sequence required for most or all CTD functions is not the tandemly repeated heptads present but, rather, repeated units of three consecutive Ser-Pro pairs interspersed with Tyr residues that are spaced at a heptad interval.

If the minimum essential yeast CTD unit is, indeed, the sequence identified genetically, how is this reconciled with observations, for example, that Candida albicans capping enzyme (Cgt1) requires at least three repeats to bind the CTD (Fabrega et al. 2003)? A close examination of the Cgt1-binding motifs on three CTD repeats provides the answer: the two major Cgt1-binding motifs, CDS1 and CDS2, are associated with two distinct CTD essential units, with the intervening CTD segment looped out and not in contact with the capping enzyme (Meinhart et al. 2005). Thus, effective CTD–Cgt1 interaction requires contact with more than one minimum CTD unit and enough flexibility between them to permit cooperative binding.

Flexible Protein Structure around Each Essential Unit Is Fundamental for CTD Function

If CTD functional units do not need to overlap, then what relationships do they have to each other and why are they conserved in a tandemly overlapping structure? The nearly WT growth of variant AR (YSPTSPSYSPTAAA)12 (Liu et al. 2008) (table 1) demonstrates that core functional units can be separated without major disruptions of CTD function. This case, perhaps, is not surprising given that AR mutants retain the normal heptad spacing between core CTD units. In contrast, additional distance between essential elements results in a progressive decline in CTD efficiency (table 1, rows AA, 5A); function is compromised fully when units are separated by seven Ala residues (table 1, row 7A) (Stiller and Cook 2004). Initially, this suggested that at least some essential functions require core units to be in relatively close proximity, for example, to accommodate direct protein steric interactions with multiple, adjacent CTD binding sites (Stiller and Cook 2004). However, we have found (see later results) that polyalanine insertions (e.g., AAAAAAA) engender the formation of stable secondary structures that could be detrimental to CTD functions. Alanine residues show the highest propensity to form helices (Chakrabartty et al. 1994); short poly-Ala peptides have been observed to adopt α-helical (Chakrabartty et al. 1994) and partial α-helical (Kentsis et al. 2004) structures and also have been observed in β-sheet or random coil (Toniolo et al. 1979), polyproline II (Shi et al. 2002), amyloid (Gasset et al. 1992), and helical dimer (Kohtani et al. 2004) conformations. The specific conformation adopted appears to depend on the length of the poly-Ala stretch (Blondelle et al. 1997), the broader protein context (Scheuermann et al. 2003), and properties of the aqueous solution (Peng et al. 2003).

To test this interpretation, we constructed CTDs with same number of intervening residues (seven) between diheptads but disrupted the 2° structure by replacing Ala with Pro as in the same locations in the canonical CTD sequence (e.g., AAPAAPA); this change restores relatively vigorous growth (table 1, row AP). Further, we inserted seven alanines between canonical triheptapeptides (A7 in table 1), thereby increasing the amount of normal, structurally unordered sequence around each given functional unit, and these mutants also grew vigorously. Thus, it is the quality of the inserted sequence, rather than the physical distance between functional units, that is critically important.

The respective behaviors of 7A and A7 mutants (table 1) are consistent with presence of stable poly-Ala structures that interfere with functions if they are too close to adjacent essential functional units. In turn, because prolines interfere with helix formation, the viability of AP mutants (with seven residue-long spacers containing prolines) (table 1, row AP) suggests that secondary structures adopted by 7A inserts may be helices that are absent when Pro residues interrupt the Ala stretches. Thus, our cumulative genetic results suggest that the essential repeating units in the CTD can function when separated by seven residues, as long as the separating sequences do not adopt stable secondary structures that impinge on protein interactions with adjacent heptapeptides.

We further investigated these genetics-based conclusions using physical–chemical approaches. First, as has been noted previously, we found that protein secondary structure predictions indicate a random coil predominates in WT repeats (supplementary fig. 2, Supplementary Material online); in contrast, the same prediction programs suggest that some helical structure is introduced by insertion of 7A spacers. To test these computational predictions, we analyzed WT and 7-alanine–mutated CTDs by CD spectroscopy. The CTDs analyzed were tag-free proteins purified using an intein self-cleavage system (see Methods). As expected, the WT CTD produced spectra consistent with predominantly random coil (fig. 2A). In contrast, both 7A and A7 CTDs produced spectra indicating significant presence of helical structure (fig. 2A). Finally, the AP CTD spectrum was extremely similar to that of WT, again indicating predominantly random coil (fig. 2A).

FIG. 2.

FIG. 2.

CD spectra of WT CTD peptide and CTD mutant peptides under different conditions. (A) CD spectra of WT, AP, 7A, and A7 proteins in pure water. The spectra are the average of eight accumulations for each protein. WT and AP are in the upper group with similar predominant random coil structure, whereas 7A and A7 are in the lower group with a majority of helical structure. The WT and 7A curves were adjusted to the same scale as AP and A7 by magnifying the signal. (B). CD temperature scan spectra for WT, AP, 7A, and A7 proteins. All the four protein samples were measured under different temperatures ranging from 10 °C to 60 °C at 10 °C intervals; “back to 10” indicates that temperature was directly dropped from 60 °C to 10 °C. With increasing temperatures, both WT and AP proteins became more unordered, and when the temperature dropped from 60 °C back to 10 °C, the spectra were not exactly the same as when beginning at 10 °C. These results suggest that changes in conformation from more ordered to less ordered as temperatures increased were not easily reversible and that the WT protein was much harder to bring back to its original conformation than was the AP mutant. As no solid material was observed in the sample cuvette at the end of the temperature scan measurements, the nonreversibility of the protein structures was not due to protein precipitation. In contrast, the 7A protein demonstrated a greater ability to recover its original structure when the temperature returned to 10 °C.

The secondary structures formed in the 7A and A7 CTDs appear to be quite stable. A isodichroic point was observed in the A7 CD melting spectrum around 200 nm (Fig. 2B), suggesting a two-state helical-coil ellipticity model (Gans et al. 1991); however, more rigorous testing is needed to support this the two-state model because an isodichronic point at the shorter wave length does not rule out the possibility that local dichroic chromophores could significantly perturb ellipticity only at longer wavelengths (Wallimann et al. 2003). In any case, its closer match to a standard α-helical CD spectrum, along with the presence of a isodichronic point in its melting curve, do suggest that stronger helical structures are formed in the A7 CTD than in 7A.

It is noteworthy that polyalanine helical structures are more stable with three intervening CTD heptads (A7: AAAAAAA YSPTSPS YSPTSPS YSPTSPS AAAAAAA), yet do not have dramatically negative impacts on CTD function. In contrast weaker helical structures directly next to essential CTD motifs (7A: AAAAAAA YSPTSPS YSPTSPS AAAAAAA) are lethal. These combined genetic and structural results indicate that placing ordered structures directly next to essential heptad motifs (underlined sequences shown above) impinges on the interactions between CTD functional units and binding partners, In contrast, unordered sequence on either side of each functional unit (A7) allows those proper interactions, even if more stable structural features are present nearby. This supports our model that strong secondary structures too close to CTD functional units are responsible for the deleterious and lethal effects of polyalanine insertions.

To test whether secondary structures in A7 or 7A CTDs could affect CTD kinase interactions with these proteins, we compared phosphorylation of WT and mutant CTDs by human CDK7 and CDK9 kinases, as well as yeast CTDK-I in vitro (Fig. 3). Consistent with their lethality, 7A CTDs, with only two heptads between each 7-alanine stretch (table 1), did not function detectably as substrate for any of the CTD kinases. In contrast, the A7 CTDs, with three heptads between each 7-alanine stretch, functioned as effective substrates for all three kinases. Both viability and phosphorylatability of 7-alanine CTDs with longer canonical sequences between Ala stretches (compare 7A and A7, table 1) are consistent with the hypothesis that Ala-dependent secondary structures interfere with accessibility of critical residues in adjacent functional units. Thus, CTD units of sufficient length (e.g., three heptads) can achieve significant levels of function, even when separated by incompatible physical structures. Our results support the following model: if canonical segments between interfering structural features are long enough to allow access of enzymes and binding proteins to core CTD units, essential CTD functions are accommodated, although not necessarily with WT efficiency.

FIG. 3.

FIG. 3.

In vitro phosphorylation assay for different polyalanine insertion mutant proteins. Relatively similar amounts of GST–CTD fusion proteins (except for WT which was usually present at somewhat lower amounts) were incubated with different CDKs for 2 h at 30 °C (details in Materials and Methods). Half of each reaction mixture was resolved by SDS-PAGE; the gels were Coomassie stained and then exposed to an Amersham screen for 30 min and scanned on a Typhoon phosphorimager. The amount of phosphates and proteins on the gel was both quantified by imagequant software, and the relative phosphorylation (phosphate/amount of protein) of each mutant protein was normalized to WT value (defined as 1.00 for each kinase), as shown in the charts in the figure.

Taken together, genetic, biochemical, and biophysical analyses indicate that introducing strong structural features into the CTD sterically hinders access to nearby essential functional units. This would appear to put serious constraints on the kinds of evolution in primary sequence that are permitted. Specifically, selection should favor maintenance of a globally unstructured domain in which each functional unit remains fully available for requisite protein interactions.

Although introduction of stable structure elements can be lethal, we also should note that mutants with short (2-alanine) insertions exhibit slower growth and lower phosphorylation efficiencies than cells with a WT CTD (Liu et al. 2008) (table 1 and fig. 3). As there is no suggestion that two alanines should form stable secondary structures, this indicates that pushing apart the core elements or, more precisely, displacing them from their normal physical locations with respect to the transcription/processing apparatus, also is somewhat deleterious to CTD function.

CTD Length rather than a Number of Essential Units Determines the CTD Function

Like the positioning of essential units and canonical heptads, overall length also is a conserved feature of the CTD; however, yeast cells growing in favorable laboratory conditions do not have obvious functional deficiencies when only 13 of the normal 26 CTD repeats are present (West and Corden 1995). Combined with the likelihood that random length variation will be introduced frequently in repetitive sequences (Nonet et al. 1987), this begs the question, why is the number of CTD repeats so strongly conserved within and among closely related taxa? Without measurable phenotypic variation between strains with slightly different CTD length, assessing potential selective advantage of one length over another is a difficult proposition. On the other hand, given the identification of CTD core essential elements in yeast, along with the knowledge that strains with CTDs of about normal overall length but with nonconsensus repeat sequences frequently show significant growth defects (table 1: rows YAS, A7, and AP), it became possible to address several questions about constraints on CTD length. For example, what is most important for dictating optimal function: the total overall length of the CTD, the absolute number of essential sequence elements present, or the spacing of the elements along the CTD? To explore these issues, we performed growth experiments on a series of yeast strain sets containing varying lengths of mutated CTDs (fig. 4). For each strain set, the CTD contained different numbers of a particular repeating unit (e.g., “AR” in fig. 4 and table 1). Each strain set thus contained CTDs that were shorter than, equal to, or longer than the normal WT length (26 repeats × 7 aa/repeat = 182 aa). The repeating units used to construct the collection of CTDs included a substitution mutant that maintained the tandemly repeated heptad register but with an altered sequence (AR), another with repeats truncated to the essential functional unit (252), and four that contained insertions up to seven residues long between sets of tandemly paired heptads (YAS, 5A, A7, and AP).

FIG. 4.

FIG. 4.

Growth rates of several viable CTD mutants over a range of CTD lengths. (A) The sequences of the viable mutants tested around WT length in this study were demonstrated. The overall length (number of amino acids) is indicated on the top of sequence, and the essential functional units in each constructs are underlined in green. The number of repeats for each CTD mutant is labeled behind the name of the mutant. (B) Substitution mutant (AR) and insertion mutants (YAS, 5A, and A7) with increasing number of repeats, from fewer to far more than natural CTD length, were cultured under the same conditions and the cell growth was monitored by measuring OD600 at 30 °C. Three different colonies for each mutant were grown, and overall doubling times were averaged. For all insertion and substitution mutants, peak growth rates were consistently achieved nearest the natural CTD length. Both shortening and lengthening the CTD resulted in reduced cell growth. Average growth rate for control cells with 26 WT repeats is shown by a red dot. (C) The same data plotted against the number of functional units instead of the total CTD length as shown in panel (B). No consistent result indicating a certain number of essential units were critical for CTD function emerges. (D) Two CTD constructs with nearest WT growth rate at near WT length were tested for growth rate under cold (15 °C) or heat shock (37 °C). Consistent with our model, the mutants grew much slower under stressful conditions (15 °C and 37 °C) than normal (30 °C), whereas almost no difference is observed in WT cells. This is consistent with the hypothesis that denser packing of functional CTD units in the WT sequence better accommodate conditions requiring greater CTD–PCAP interactions, and that pre-established PCAP binding on the CTD affects the efficiency of interactions with additional stress response PCAPs.

Surprisingly, regardless of the sequence repeated, the length variant in each strain set showing the highest growth rate always contained the CTD with nearest to normal length (182 aa) (fig. 4B), rather than the CTD with a WT-equivalent number of essential functional units (fig. 4C). For example, the variant AR(84) (the length of each mutated CTD is presented by the total number of amino acids contained that is described as a subscript next to the name of the CTD mutant) is viable but contains only six essential functional units, fewer than the seven units contained within the eight WT repeats necessary for viability. This result suggests that it is CTD length rather than number of essential units that governs functionality of the CTD (as reflected in the growth rate). Our other results are consistent with this suggestion. For example, variant CTDs with additional residues inserted between heptad pairs or triplets (5A, A7, AP) further reduce the number of essential functional units per length of CTD (see fig. 4A), yet they support viability. Variant AP(168), for example, is equivalent in length to a WT CTD with 24 heptad repeats and 23+ essential units (fig. 4A), but the variant CTD contains only 8 essential functional units. The fact that the AP(168) strain grows extremely well (fig. 4B,4C) strongly supports the idea that CTD length influences CTD function much more than does the number of essential minimal units.

Although overall length is clearly a very important feature determining optimal CTD function, it is also clear that none of the nonconsensus CTD variant strains grows as well as WT cells with a normal length CTD. This is not surprising because both primary sequence and spacing of repeat units have been known for some time to affect CTD function (West and Corden 1995; Stiller and Cook 2004; Liu et al. 2008). Indeed, among the mutants we studied, all grew much more slowly than WT, except for two: 252 and AR variants of about normal length. In fig. 4A we see that variant 252 retains the highest density of essential units in the CTDs tested, other than WT (fig. 4). The figure also reveals that the AR CTD contains both a reasonably high density of essential units and the normal heptad register (e.g., fig. 4A). These observations are congruent with what is seen in the evolutionary history of the CTD in major lineages like animals and green plants, where tandem structure is more conserved than primary sequence in degenerate heptads (fig. 1). Thus, it appears that both the combination of overall CTD length and its tandem heptapeptide register are under strong purifying selection in these groups.

One additional novel result emerged from the comparative growth analyses. When preparing the constructs, we noticed that one AR mutant contained an inadvertent frameshift after S7 residues, which resulted in deletion of the last 13 residues present in the yeast WT CTD control (YPYDVPDYNENSR). We measured the growth rate of this mutant and found that it grew much more slowly than a variant with a similar length but a normally positioned stop codon (data not shown). This indicates that, even though the last 13 residues of S. cerevisiae Rpb1 are not essential for cell viability, as shown by this particular AR mutant, they play some important role in CTD function. To date, there have been no reports concerning such a function in yeast; in contrast, in humans, the last 10 CTD residues have been linked to CTD stability (Chapman et al. 2004, 2005).

Discussion

CTD Length and Organization of CTD-Associating Proteins

Extant results clearly support the idea that CTD repeats are functionally redundant. If this is so, then why are extra repeats retained and overall length strongly conserved within species? One model is that a certain length is required to maintain an optimal “loading platform” for CTD- and phosphoCTD-associating proteins (PCAPs) that will accommodate different configurations of these factors needed to survive all the conditions to which the cells are exposed. We propose three additional aspects of this model. The first is that binding of protein complexes responsible for “essential functions” (e.g., those required under permissive laboratory growth conditions) determines the minimum number of repeats (the essential scaffold) and confers strong purifying selection on CTD regions that retain overlapping tandem essential units (e.g., proximal CTD regions in plants and animals). Second, length beyond the minimum provides room to bind proteins involved in additional or accessory functions not usually required under laboratory growth conditions. Third, noncanonical repeat regions provide landing pads (or space) for proteins involved in more taxon-specific functions.

That the requirements of an “essential scaffold” determine the minimal CTD length is consistent with observations from CTD truncation mutants. For example, truncation of the yeast CTD to 10 repeats or substitution with 7 WT and 8 S2A repeats showed conditional growth (West and Corden 1995) but had little if any effect on basal transcription factor recruitment to promoters or on production of mature transcripts tested from constitutively expressed genes (Ahn et al. 2009). In contrast, constructs that retained less than the minimal viable CTD length showed severe defects in most or all essential RNA processing events (Custodio et al. 2007). A cautionary note for interpreting results from truncation studies in mammalian cells is that the last 10 residues of the WT CTD are responsible for CTD stability (Chapman et al. 2004, 2005). Thus, mammalian cell studies in which CTD truncations do not contain these last 10 amino acids may generate artifacts due to Rpb1 degradation. In yeast, there is no clear evidence for a role of the last nonconsensus part of the CTD, apart from the effects on growth rate in the AR mutant reported here.

Support for the idea that additional repeats are needed for events that do not occur under optimal conditions (e.g., stress responses) comes from yeast truncation mutants with about the minimal number of repeats needed for viability; these are cold and temperature sensitive and inositol auxotrophs (Nonet et al. 1987), as are shorter repeats with Ser2 or Ser5 substitutions at C- or N-terminus (West and Corden 1995). Apparently, the auxiliary functions required for survival under stress need additional CTD repeats. CTD length also seems to be important in mammals for a differentiation-specific alternative splicing event, repression of weak fibronectin extra domain I exon inclusion (de la Mata and Kornblihtt 2006). In other experiments, it was observed that mammalian cells with 31 CTD repeats are proficient in transcription, splicing, cleavage, and polyadenylation but are deficient in mature mRNA transportation, suggesting that the missing CTD repeats are responsible for binding proteins needed for that function (Custodio et al. 2007). Finally, the CTD length required for viability of a mouse cell in culture 25 repeats (Bartolomei et al. 1988) is less than that required for differentiation and development of a whole mouse (39 repeats) (Meininghaus et al. 2000). Although, these results do not rule out the binding of “auxiliary” protein complexes to proximal positions along the CTD, they do indicate that additional repeats are required as more and more functions are incorporated into the transcriptional/processing factory.

An idea that emerges from these observations is that overall CTD length is determined by events that require the largest physical load of PCAPs. These events might include responses to signals from an external stress or to developmental cues during differentiation. At its natural length, the CTD accommodates all requisite binding factors, and these factors are arranged in (presumably) an optimal manner. From the earlier arguments, we might posit that a group of “core” PCAPs occupies CTD binding sites mainly proximal to the body of RNAP II. During stress or differentiation, auxiliary factors would then bind largely to distal sites (the “extra” repeats). In addition, our demonstration that decreased function occurs with all CTD variants extended beyond WT length supports the idea that specific and/or relative positions of CTD binding sites is key to CTD functional efficiency and suggests that binding by many CTD-associating proteins is highly orchestrated. This view leads to the model in figure 5 and supplementary figure 3 (Supplementary Material online) for how CTD primary sequence and length change influence single PCAP binding as well as overall organization of PCAPs. One prediction of this model is that a shortened CTD will not properly accommodate the auxiliary PCAPs when required (fig. 5B). Conversely, an overly long CTD, containing distal repeats that could compete for binding factors, could deplete properly constituted binding complexes, thereby compromising CTD function (fig. 5C).

FIG. 5.

FIG. 5.

Putative models of effects of CTD length alternations on PCAPs organization along the CTD. Nucleosomes (DNA and histone core) are marked in orange. RNAP II and its CTD are labeled in purple. Various PCAPs are shown in different colors. Variation in CTD phosphorylation patterns are not shown or considered here. (A) On the WT CTD, the PCAPs are organized compactly and tightly in temporal and spacial order along the CTD. The dynamic CTD structure permits the formation of large, compound functional protein complexes. As shown, proteins H and L, which do not contact the phosphoCTD directly, are required for the organization of the complex by binding to other direct CTD-interacting PCAPs (C, D, G and I, J, K). In addition, proteins M, N, and O form another complex that collectively interacts directly with the CTD over multiple adjacent functional units. (B) The shortened CTD does provide enough binding sites for all essential or conditionally required PCAPs. As illustrated in (B), around 2/3 of the CTD is missing, leading to the inability to load proteins J, K, L, M, N, O, and P on the CTD at the appropriate time and place. (C) Extended CTD increases the potential number of binding sites, which results in competitive binding sites for the same PCAPs in different locations, thereby reducing the efficiency by which PCAPs localize to their optimal binding sites. Note how the association of proteins K and I with an extra binding site in the extended CTD region disrupts their normal organizational complex with J. (D) Effects of CTD insertions on PCAPs binding. The inserted sequences between diheptads or essential units (YSPTSPS YSPT) are marked as purple helices. Insertions lead to increased distance between binding sites and accordingly changes the spacial order or position of individual proteins. As shown in panel (D), protein H no longer can make contact with D and C, thus disabling the formation of a larger mega-complex. Similarly, although proteins J and I still bind to the CTD, no preferred conformation exists for the further binding of L. In some cases, protein associations are not fully disrupted by insertions, such as proteins M, N, and O because M binds to both N and O, thereby establishing a functional complex. Nevertheless, disruption of normal contact between N and O could reduce functional efficiency of the collective protein interactions.

We performed one additional experiment as an initial test of this model. Strains that contain mutant CTDs of near WT length and exhibit close to WT growth rates at permissive temperatures were grown at both low (15 °C) and high (37 °C) temperature. Under these more stressful conditions, which likely require the CTD to coordinate additional functions, all the mutants performed proportionally more poorly than WT controls (fig. 4D). These results are consistent with our proposed model and also with the prediction that the strongest selection on global CTD length should be evident under more complicated transcriptional regimens.

An additional kind of observation, combined with those already discussed, induces us to add one more feature to the model. There are some results in the literature to suggest that the distribution of bound proteins along the CTD may not be stochastic; instead, PCAPs may bind at specific sites or positions. In other words, different factors, carrying out distinct functions, may preferentially target different regions of CTD. One result consistent with this idea is that mammalian Spt6 selectively binds to the N-terminal consensus repeats in vitro (Yoh et al. 2008); another is the previously mentioned selective loss of one RNA processing event, proper mRNA transport to the cytoplasm, when certain CTD repeats are deleted from mammalian Rpb1 (Custodio et al. 2007). Spatial constraints on where direct-binding PCAPs can associate with the CTD, if confirmed, presumably exist to position these proteins for proper interactions with other components of the transcription elongation mega-complex. Depending on how spatially reproducible these interactions are, it may be that the many components associated with the CTD are organized into a smaller number of distinct multisubunit structures (e.g., see fig. 5).

A Panoramic View of the CTD

Key features of the RNAP II CTD conserved in evolution are the canonical heptapeptide sequence and a tandemly repeated structure, along with conservation of overall length and of patterns of sequence degeneracy among related taxa. The canonical repeat motif that selection has favored is remarkably adaptable. In each heptad, Y1, S2, T4, S5, and S7 can be phosphorylated and P3 and P6 can shift structure between cis- and trans conformations. Through all these changes, this particular configuration of amino acids remains relatively flexible, providing a vast array of potential binding sites for transcription, processing, and other factors (Sudol et al. 2001).

Thus, the specific sequence of the canonical heptad likely evolved because it optimized efficient modifiable regulation, whereas remaining largely unordered when unphosphorylated and, at the same time, permitting a wide variety of inducible structures when phosphorylated (Zhang and Corden 1991; Meinhart et al. 2005). With various proteins loaded on, however, the CTD is under greater regional and global constraints. For example, when factors bind near a given pair of heptads along the tandem array, the N- and C-terminal repeats can become functionally nonequivalent (Custodio et al. 2007). Thus, this simple set of identical repeats can be adjusted for scores of different and highly specific interactions, through time- and position-dependent modifications.

The fact that two canonical tandem heptads are required to form individual “essential” units could favor conservation of a global tandem structure as well; however, structurally flexible sequence inserted between the essential units also is fairly well tolerated. Nevertheless, a global tandem structure appears to optimize CTD function in a number of ways, at least in strongly conserved canonical regions where numerous proteins related to core functions must be accommodated, possibly in varying patterns during expression of different genes.

A tandem structure ensures that any given YSPTSPS motif is part of a core CTD functional unit. It also maximizes the flexibility of positional binding of those motifs by various protein factors. Because functional units overlap, a global, tandemly repeated structure provides the largest number and densest packing of binding positions along the length of CTD. This allows the most efficient positioning of multiple proteins with respect to each other. In other words, for those core factors that have evolved to interact with canonical essential units, context-dependent binding can be exploited most efficiently through maximization of binding sites over the fixed length of CTD. Therefore, even though most mutated CTDs have enough core units to accommodate essential CTD–protein interactions, they require proteins to bind at suboptimal positions (see fig. 5). Moreover, because each individual heptad is inherently unstructured, a tandem array results in a disordered structure along the entire length of the CTD. Thus, the globally repetitive CTD structure is likely to be under strong selection also because it maintains an inherently unfolded state within and around each essential repeat unit.

Finally, overall CTD length is likely conserved to avoid both the loss of intermittently required binding sites (usually at distal locations) and the addition of “excess” potentially competitive binding sites that could interfere with optimal, spacially specific protein arrangements.

The view of CTD evolution that emerges from these considerations takes into account both the CTD and its many associating components. Together these have been coevolving over much of eukaryotic evolution, with changes in the CTD constrained to avoid disrupting the highly orchestrated temporal and spacial interactions that occur in CTD-based multicomponent entities such as the transcription elongation mega-complex. This view agrees with comparative genomic studies that show an overall conservation of CTD-related proteins rather than conservation of their specific interactions with the CTD (Guo and Stiller 2005). In conclusion, requirements for maintenance of a dynamic microenvironment around each CTD unit, balanced by retention of an optimized macroenvironment for overall binding across the full CTD length, can explain the patterns of CTD evolution across the broad scale of eukaryotic diversity.

Supplementary Material

Supplementary figures 13 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Supplementary Material

Supplementary Data

Acknowledgments

This work was supported by grant MCB 0133295 from the National Science Foundation to Dr J.W.S. and by grant R01GM040505 from the National Institutes of Health to Dr A.L.G.

References

  1. Ahn SH, Keogh MC, Buratowski S. Ctk1 promotes dissociation of basal transcription factors from elongating RNA polymerase II. EMBO J. 2009;28:205–212. doi: 10.1038/emboj.2008.280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Akhtar MS, Heidemann M, Tietjen JR, Zhang DW, Chapman RD, Eick D, Ansari AZ. TFIIH kinase places bivalent marks on the carboxy-terminal domain of RNA polymerase II. Mol Cell. 2009;34:387–393. doi: 10.1016/j.molcel.2009.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bartolomei MS, Halden NF, Cullen CR, Corden JL. Genetic analysis of the repetitive carboxyl-terminal domain of the largest subunit of mouse RNA polymerase II. Mol Cell Biol. 1988;8:330–339. doi: 10.1128/mcb.8.1.330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Blondelle SE, Forood B, Houghten RA, Perez-Paya E. Polyalanine-based peptides as models for self-associated beta-pleated-sheet complexes. Biochemistry. 1997;36:8393–8400. doi: 10.1021/bi963015b. [DOI] [PubMed] [Google Scholar]
  5. Buratowski S. The CTD code. Nat Struct Biol. 2003;10:679–680. doi: 10.1038/nsb0903-679. [DOI] [PubMed] [Google Scholar]
  6. Chakrabartty A, Kortemme T, Baldwin RL. Helix propensities of the amino acids measured in alanine-based peptides without helix-stabilizing side-chain interactions. Protein Sci. 1994;3:843–852. doi: 10.1002/pro.5560030514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chapman RD, Conrad M, Eick D. Role of the mammalian RNA polymerase II C-terminal domain (CTD) nonconsensus repeats in CTD stability and cell proliferation. Mol Cell Biol. 2005;25:7665–7674. doi: 10.1128/MCB.25.17.7665-7674.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chapman RD, Heidemann M, Albert TK, Meilhammer R, Flatley A, Meisterernst M, Kremmer E, Eick D. Transcribing RNA polymerase II is phosphorylated at CTD residue serine-7. Science. 2007;318:1780–1782. doi: 10.1126/science.1145977. [DOI] [PubMed] [Google Scholar]
  9. Chapman RD, Heidemann M, Hintermair C, Eick D. Molecular evolution of the RNA polymerase II CTD. Trends Genet. 2008;24:289–296. doi: 10.1016/j.tig.2008.03.010. [DOI] [PubMed] [Google Scholar]
  10. Chapman RD, Palancade B, Lang A, Bensaude O, Eick D. The last CTD repeat of the mammalian RNA polymerase II large subunit is important for its stability. Nucleic Acids Res. 2004;32:35–44. doi: 10.1093/nar/gkh172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cho EJ, Takagi T, Moore CR, Buratowski S. mRNA capping enzyme is recruited to the transcription complex by phosphorylation of the RNA polymerase II carboxy-terminal domain. Gen Dev. 1997;11:3319–3326. doi: 10.1101/gad.11.24.3319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Corden JL. RNA polymerase II transcription cycles. Curr Opin Genet Dev. 1993;3:213–218. doi: 10.1016/0959-437x(93)90025-k. [DOI] [PubMed] [Google Scholar]
  13. Corden JL. Transcription. Seven ups the code. Science. 2007;318:1735–1736. doi: 10.1126/science.1152624. [DOI] [PubMed] [Google Scholar]
  14. Corden JL, Patturajan M. A CTD function linking transcription to splicing. Trends Biochem Sci. 1997;22:413–416. doi: 10.1016/s0968-0004(97)01125-0. [DOI] [PubMed] [Google Scholar]
  15. Custodio N, Vivo M, Antoniou M, Carmo-Fonseca M. Splicing- and cleavage-independent requirement of RNA polymerase II CTD for mRNA release from the transcription site. J Cell Biol. 2007;179:199–207. doi: 10.1083/jcb.200612109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. de la Mata M, Kornblihtt AR. RNA polymerase II C-terminal domain mediates regulation of alternative splicing by SRp20. Nat Struct Mol Biol. 2006;13:973–980. doi: 10.1038/nsmb1155. [DOI] [PubMed] [Google Scholar]
  17. Egloff S, Murphy S. Role of the C-terminal domain of RNA polymerase II in expression of small nuclear RNA genes. Biochem Soc Trans. 2008a;36:537–539. doi: 10.1042/BST0360537. [DOI] [PubMed] [Google Scholar]
  18. Egloff S, Murphy S. Cracking the RNA polymerase II CTD code. Trends Genet. 2008b;24:280–288. doi: 10.1016/j.tig.2008.03.008. [DOI] [PubMed] [Google Scholar]
  19. Egloff S, O'Reilly D, Chapman RD, Taylor A, Tanzhaus K, Pitts L, Eick D, Murphy S. Serine-7 of the RNA polymerase II CTD is specifically required for snRNA gene expression. Science. 2007;318:1777–1779. doi: 10.1126/science.1145989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Fabrega C, Shen V, Shuman S, Lima CD. Structure of an mRNA capping enzyme bound to the phosphorylated carboxy-terminal domain of RNA polymerase II. Mol Cell. 2003;11:1549–1561. doi: 10.1016/s1097-2765(03)00187-4. [DOI] [PubMed] [Google Scholar]
  21. Gans PJ, Lyu PC, Manning MC, Woody RW, Kallenbach NR. The helix-coil transition in heterogeneous peptides with specific side-chain interactions: theory and comparison with CD spectral data. Biopolymers. 1991;31:1605–1614. doi: 10.1002/bip.360311315. [DOI] [PubMed] [Google Scholar]
  22. Gasset M, Baldwin MA, Lloyd DH, Gabriel JM, Holtzman DM, Cohen F, Fletterick R, Prusiner SB. Predicted alpha-helical regions of the prion protein when synthesized as peptides form amyloid. Proc Natl Acad Sci U S A. 1992;89:10940–10944. doi: 10.1073/pnas.89.22.10940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Gudipati RK, Villa T, Boulay J, Libri D. Phosphorylation of the RNA polymerase II C-terminal domain dictates transcription termination choice. Nat Struct Mol Biol. 2008;15:786–794. doi: 10.1038/nsmb.1460. [DOI] [PubMed] [Google Scholar]
  24. Guo Z, Stiller JW. Comparative genomics and evolution of proteins associated with RNA polymerase II C-terminal domain. Mol Biol Evol. 2005;22:2166–2178. doi: 10.1093/molbev/msi215. [DOI] [PubMed] [Google Scholar]
  25. Hampsey M, Reinberg D. Tails of intrigue: phosphorylation of RNA polymerase II mediates histone methylation. Cell. 2003;113:429–432. doi: 10.1016/s0092-8674(03)00360-x. [DOI] [PubMed] [Google Scholar]
  26. Jacobs EY, Ogiwara I, Weiner AM. Role of the C-terminal domain of RNA polymerase II in U2 snRNA transcription and 3' processing. Mol Cell Biol. 2004;24:846–855. doi: 10.1128/MCB.24.2.846-855.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Juban MM, Javadpour MM, Barkley MD. Circular dichroism studies of secondary structure of peptides. Methods Mol Biol. 1997;78:73–78. doi: 10.1385/0-89603-408-9:73. [DOI] [PubMed] [Google Scholar]
  28. Kentsis A, Mezei M, Gindin T, Osman R. Unfolded state of polyalanine is a segmented polyproline II helix. Proteins. 2004;55:493–501. doi: 10.1002/prot.20051. [DOI] [PubMed] [Google Scholar]
  29. Kim YJ, Bjorklund S, Li Y, Sayre MH, Kornberg RD. A multiprotein mediator of transcriptional activation and its interaction with the C-terminal repeat domain of RNA polymerase II. Cell. 1994;77:599–608. doi: 10.1016/0092-8674(94)90221-6. [DOI] [PubMed] [Google Scholar]
  30. Kishore SP, Perkins SL, Templeton TJ, Deitsch KW. An unusual recent expansion of the C-terminal domain of RNA polymerase II in primate malaria parasites features a motif otherwise found only in mammalian polymerases. J Mol Evol. 2009;68:706–714. doi: 10.1007/s00239-009-9245-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Kohtani M, Schneider JE, Jones TC, Jarrold MF. The mobile proton in polyalanine peptides. J Am Chem Soc. 2004;126:16981–16987. doi: 10.1021/ja045336d. [DOI] [PubMed] [Google Scholar]
  32. Liu P, Greenleaf AL, Stiller JW. The essential sequence elements required for RNAP II carboxyl-terminal domain function in yeast and their evolutionary conservation. Mol Biol Evol. 2008;25:719–727. doi: 10.1093/molbev/msn017. [DOI] [PubMed] [Google Scholar]
  33. Maniatis T, Reed R. An extensive network of coupling among gene expression machines. Nature. 2002;416:499–506. doi: 10.1038/416499a. [DOI] [PubMed] [Google Scholar]
  34. McCracken S, Fong N, Rosonina E, Yankulov K, Brothers G, Siderovski D, Hessel A, Foster S, Shuman S, Bentley DL. 5′-Capping enzymes are targeted to pre-mRNA by binding to the phosphorylated carboxy-terminal domain of RNA polymerase II. Gene Dev. 1997;11:3306–3318. doi: 10.1101/gad.11.24.3306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. McCracken S, Fong N, Yankulov K, Ballantyne S, Pan G, Greenblatt J, Patterson SD, Wickens M, Bentley DL. The C-terminal domain of RNA polymerase II couples mRNA processing to transcription. Nature. 1997;385:357–361. doi: 10.1038/385357a0. [DOI] [PubMed] [Google Scholar]
  36. Meinhart A, Kamenski T, Hoeppner S, Baumli S, Cramer P. A structural perspective of CTD function. Gene Dev. 2005;19:1401–1415. doi: 10.1101/gad.1318105. [DOI] [PubMed] [Google Scholar]
  37. Meininghaus M, Chapman RD, Horndasch M, Eick D. Conditional expression of RNA polymerase II in mammalian cells. Deletion of the carboxyl-terminal domain of the large subunit affects early steps in transcription. J Biol Chem. 2000;275:24375–24382. doi: 10.1074/jbc.M001883200. [DOI] [PubMed] [Google Scholar]
  38. Nonet M, Sweetser D, Young RA. Functional redundancy and structural polymorphism in the large subunit of RNA polymerase II. Cell. 1987;50:909–915. doi: 10.1016/0092-8674(87)90517-4. [DOI] [PubMed] [Google Scholar]
  39. Peng Y, Hansmann UHE, Nelson Alves A. Solution effects and the order of the helix-coil transition in polyalanine. J Chem Phys. 2003;118:2374–2380. [Google Scholar]
  40. Phatnani HP, Greenleaf AL. Phosphorylation and functions of the RNA polymerase II CTD. Gene Dev. 2006;20:2922–2936. doi: 10.1101/gad.1477006. [DOI] [PubMed] [Google Scholar]
  41. Proudfoot NJ, Furger A, Dye MJ. Integrating mRNA processing with transcription. Cell. 2002;108:501–512. doi: 10.1016/s0092-8674(02)00617-7. [DOI] [PubMed] [Google Scholar]
  42. Ryman K, Fong N, Bratt E, Bentley DL, Ohman M. The C-terminal domain of RNA Pol II helps ensure that editing precedes splicing of the GluR-B transcript. RNA. 2007;13:1071–1078. doi: 10.1261/rna.404407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Scheuermann T, Schulz B, Blume A, Wahle E, Rudolph R, Schwarz E. Trinucleotide expansions leading to an extended poly-L-alanine segment in the poly (A) binding protein PABPN1 cause fibril formation. Protein Sci. 2003;12:2685–2692. doi: 10.1110/ps.03214703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Shi Z, Olson CA, Rose GD, Baldwin RL, Kallenbach NR. Polyproline II structure in a sequence of seven alanine residues. Proc Natl Acad Sci U S A. 2002;99:9190–9195. doi: 10.1073/pnas.112193999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Stiller JW, Cook MS. Functional unit of the RNA polymerase II C-terminal domain lies within heptapeptide pairs. Euk Cell. 2004;3:735–740. doi: 10.1128/EC.3.3.735-740.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Stiller JW, Hall BD. Evolution of the RNA polymerase II C-terminal domain. Proc Natl Acad Sci U S A. 2002;99:6091–6096. doi: 10.1073/pnas.082646199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Stiller JW, McConaughy BL, Hall BD. Evolutionary complementation for polymerase II CTD function. Yeast. 2000;16:57–64. doi: 10.1002/(SICI)1097-0061(20000115)16:1<57::AID-YEA509>3.0.CO;2-E. [DOI] [PubMed] [Google Scholar]
  48. Sudol M, Sliwa K, Russo T. Functions of WW domains in the nucleus. FEBS Lett. 2001;490:190–195. doi: 10.1016/s0014-5793(01)02122-6. [DOI] [PubMed] [Google Scholar]
  49. Toniolo CB, Maria Gian, Mutter Manfred. Conformations of poly(ethylene glycol) bound homooligo-L-alanines and -L-valines in aqueous solution. J Am Chem Soc. 1979;101:450–454. [Google Scholar]
  50. Vasiljeva L, Kim M, Mutschler H, Buratowski S, Meinhart A. The Nrd1-Nab3-Sen1 termination complex interacts with the Ser5-phosphorylated RNA polymerase II C-terminal domain. Nat Struct Mol Biol. 2008;15:795–804. doi: 10.1038/nsmb.1468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Wallimann P, Kennedy RJ, Miller JS, Shalongo W, Kemp DS. Dual wavelength parametric test of two-state models for circular dichroism spectra of helical polypeptides: anomalous dichroic properties of alanine-rich peptides. J Am Chem Soc. 2003;125:1203–1220. doi: 10.1021/ja0275360. [DOI] [PubMed] [Google Scholar]
  52. West ML, Corden JL. Construction and analysis of yeast RNA polymerase II CTD deletion and substitution mutations. Genetics. 1995;140:1223–1233. doi: 10.1093/genetics/140.4.1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Yoh SM, Lucas JS, Jones KA. The Iws1:Spt6:CTD complex controls cotranscriptional mRNA biosynthesis and HYPB/Setd2-mediated histone H3K36 methylation. Gene Dev. 2008;22:3422–3434. doi: 10.1101/gad.1720008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Zhang J, Corden JL. Phosphorylation causes a conformational change in the carboxyl-terminal domain of the mouse RNA polymerase II largest subunit. J Biol Chem. 1991;266:2297–2302. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES