Skip to main content
RNA logoLink to RNA
. 2022 Nov;28(11):1534–1541. doi: 10.1261/rna.079322.122

Molecular basis for the recognition of the AUUAAA polyadenylation signal by mPSF

Pedro A Gutierrez 1, Jia Wei 1, Yadong Sun 1,1, Liang Tong 1
PMCID: PMC9745836  PMID: 36130077

Abstract

The polyadenylation signal (PAS) is a key sequence element for 3′-end cleavage and polyadenylation of messenger RNA precursors (pre-mRNAs). This hexanucleotide motif is recognized by the mammalian polyadenylation specificity factor (mPSF), consisting of CPSF160, WDR33, CPSF30, and Fip1 subunits. Recent studies have revealed how the AAUAAA PAS, the most frequently observed PAS, is recognized by mPSF. We report here the structure of human mPSF in complex with the AUUAAA PAS, the second most frequently identified PAS. Conformational differences are observed for the A1 and U2 nucleotides in AUUAAA compared to the A1 and A2 nucleotides in AAUAAA, while the binding modes of the remaining 4 nt are essentially identical. The 5′ phosphate of U2 moves by 2.6 Å and the U2 base is placed near the six-membered ring of A2 in AAUAAA, where it makes two hydrogen bonds with zinc finger 2 (ZF2) of CPSF30, which undergoes conformational changes as well. We also attempted to determine the binding modes of two rare PAS hexamers, AAGAAA and GAUAAA, but did not observe the RNA in the cryo-electron microscopy density. The residues in CPSF30 (ZF2 and ZF3) and WDR33 that recognize PAS are disordered in these two structures.

INTRODUCTION

Nascent messenger RNAs (pre-mRNAs) in eukaryotes must undergo a series of modifications before they can be translated into proteins. These modifications include the addition of a 5′ cap, splicing of exons, and 3′-end cleavage and polyadenylation. Pre-mRNA 3′-end cleavage and polyadenylation—or 3′-end processing—involves an endonucleolytic cleavage event at a specific site in the 3′ untranslated region, followed by the addition of a polyadenylate [poly(A)] tail (Proudfoot 2011; Yang and Doublie 2011; Shi and Manley 2015; Sun et al. 2020). This 3′-end processing is critical for mRNA stability, nuclear export and translation into protein.

In mammalian cells, the polyadenylation signal (PAS) is a key sequence element in the RNA required for initiation of 3′-end processing. PAS consists of 6 nt (Proudfoot and Brownlee 1974; Fitzgerald and Shenk 1981; Wickens and Stephenson 1984) and is typically located 10–30 nt upstream of the cleavage site. The most frequently observed PAS hexamer is AAUAAA, found in over 50% of human and mouse mRNAs (Beaudoing et al. 2000; Tian et al. 2005; Gruber et al. 2016). The second most frequently observed PAS hexamer is AUUAAA, with a frequency of ∼10%. Many other PAS hexamers have also been identified, primarily single-nucleotide variants of AAUAAA and AUUAAA, but each at much lower frequency (<2%). These include AAGAAA and GAUAAA. The PAS variants are often associated with alternative polyadenylation (Edwalds-Gilbert et al. 1997; Gautheret et al. 1998; Tian and Manley 2017).

The AAUAAA PAS is recognized by the mammalian polyadenylation specificity factor (mPSF) (Chan et al. 2014; Schonemann et al. 2014)—a multisubunit subcomplex of the cleavage and polyadenylation specificity factor (CPSF) (Gilmartin et al. 1988; Murthy and Manley 1992). mPSF consists of protein factors CPSF160, WDR33, CPSF30, and Fip1. CPSF160 contains three β-propeller domains (BPA, BPB, BPC) and a carboxy-terminal domain (CTD) (Fig. 1A). WDR33 contains a WD40 β-propeller domain near the amino terminus. CPSF30 contains five zinc fingers (ZF1–ZF5) near the amino terminus and a zinc knuckle at the carboxyl terminus.

FIGURE 1.

FIGURE 1.

Overall structure of the human CPSF160–WDR33–CPSF30–AUUAAA PAS RNA quaternary complex. (A) Domain organizations of human CPSF160, WDR33, and CPSF30. The domains of CPSF160 and WDR33 are labeled, and the zinc fingers of CPSF30 are in green. The collagen-like segment of WDR33 is in gray. The canonical isoform of CPSF30 is shown. Isoform 2 studied here lacks residues 191–215. (B) Binding mode of the AAUAAA RNA by mPSF (Sun et al. 2018). Hydrogen-bonding interactions are shown as dashed lines (red), and the side chains of amino acid residues involved are omitted for clarity. (C) Local resolution map for the quaternary complex. (D) Overall structure of the quaternary complex, colored as in A. The PAS RNA is in orange. The structure of the AAUAAA quaternary complex is shown in gray. The superposition is based on WDR33, and the difference in position of CPSF160 BPB is readily visible. (E) Overall structure of the quaternary complex, viewed after a rotation of 90° around the vertical axis. Panel C was produced with Chimera (Pettersen et al. 2004), and the other structure figures were produced with PyMOL (www.pymol.org), unless noted otherwise.

The molecular basis for the recognition of the AAUAAA PAS was revealed by the structures of its complex with mPSF (Clerici et al. 2018; Sun et al. 2018). The backbone of the PAS hexamer takes on an S shape, and the RNA is sandwiched between the WD40 domain of WDR33 and ZF2 and ZF3 of CPSF30 (Fig. 1B). CPSF160 does not have any direct interactions with the PAS and instead acts as a scaffold to recruit CPSF30 and WDR33 and preorganize them for PAS binding. In the complex with mPSF, the six bases of the PAS hexamer are divided into three pairs that make distinct contacts with CPSF30 and WDR33. Bases A1 and A2 are recognized by ZF2 while A4 and A5 by ZF3 of CPSF30 (Fig. 1B). Bases U3 and A6 form a Hoogsteen base pair, sandwiched on each side by aromatic side chains from the WD40 domain and its amino- terminal extension in WDR33.

Biophysical studies show that mPSF has low nanomolar affinity for RNA containing the AAUAAA PAS (Clerici et al. 2017; Hamilton et al. 2019), while the affinity for the AUUAAA PAS is weaker and the binding of the other PAS variants is barely detectable or not detectable in the assays. To understand how mPSF recognizes the AUUAAA PAS and other variants, we have studied human mPSF in complex with RNA containing the AUUAAA, AAGAAA, or GAUAAA PAS by cryo-electron microscopy (cryo-EM), and revealed the binding mode of the AUUAAA PAS. However, the AAGAAA and GAUAAA RNAs are not observed by the cryo-EM studies, and the ZF2 and ZF3 of CPSF30 are disordered.

RESULTS

Overall structure of human mPSF in complex with AUUAAA

The structure of human mPSF in complex with an 11-mer RNA oligo containing the AUUAAA sequence, CAUUAAACAAC, has been determined at 2.53 Å resolution by cryo-EM (Table 1; Fig. 1C). The atomic model has good agreement with the EM density and the expected bond lengths, bond angles, and other geometric parameters. As was observed earlier in the AAUAAA complex, CPSF30 was disordered from ZF4 to the carboxyl terminus. Fip1 was included in the protein sample, but was not present in the EM density.

TABLE 1.

Summary of cryo-EM and structure refinement statistics

graphic file with name 1534tb01.jpg

The overall structure of the mPSF-AUUAAA complex is similar to that of the mPSF-AAUAAA complex, with rms distance of 0.66 Å for 1605 equivalent Cα atoms in CPSF160, WDR33, and CPSF30 (Sun et al. 2018). Nonetheless, there are small differences in the relative positions of the three proteins in the two complexes. For example, superposing WDR33 in the two complexes (rms distance of 0.53 Å for 356 Cα atoms) also brings CPSF30 into close superposition, but small differences are seen in the position of CPSF160, especially its BPB domain that is far away from WDR33 and CPSF30 (Fig. 1D,E). Since this superposition produces a better overlay of WDR33 and CPSF30 in their recognition of the PAS RNA, it is used for the structural analysis on RNA recognition.

For CPSF160, the rms distance is 0.56 Å for 1126 Cα atoms between the two structures, and several loops show conformational differences. Within WDR33, the β-sheets of the two structures have essentially identical structures. Several loops show noticeable structural differences, although the amino- and carboxy-terminal extensions of the WD40 domain, which mediate interactions with the RNA and CPSF160, assume similar conformations. Larger structural variations are observed for CPSF30 (rms distance of 0.88 Å for 114 Cα atoms), due in part to the weaker EM density for this subunit (Fig. 1C) and to the recognition of the A1–U2 nucleotide (see below). Overall, the binding of AUUAAA RNA did not lead to substantial overall structural changes in mPSF compared to the binding of AAUAAA RNA.

Binding mode of AUUAAA

The AUUAAA PAS has well-defined EM density (Fig. 2A). Although an 11-mer oligo was used for the structural analysis, only the AUUAAA PAS and the 5′ phosphate of the nucleotide immediately following it (C7) was observed. This phosphate is important for the high affinity binding between mPSF and the PAS (Hamilton et al. 2019). Therefore, analogous to observations with the AAUAAA complex, mPSF only recognizes the PAS hexamer and the following phosphate group and has no stable interactions with other parts of the RNA.

FIGURE 2.

FIGURE 2.

Recognition of the AUUAAA PAS RNA. (A) Cryo-EM density (magenta mesh) for AUUAAA of the PAS RNA (orange), as well as the 5′ phosphate of the nucleotide (C7) immediately following the PAS. (B) Overlay of the binding mode of the AUUAAA PAS (orange) and the AAUAAA PAS (gray). Hydrogen bonds in the U3–A6 Hoogsteen base pair are indicated with dashed lines in red. (C) Binding mode of the AUUAAA PAS (orange) at the interface of CPSF30 ZF2-ZF3 (green) and WDR33 (light blue). (D) Detailed interactions showing the recognition of the A1 and U2 bases of the PAS (orange) by ZF2 of CPSF30 (green). Hydrogen-bonds between the adenine bases and the protein are indicated with dashed lines in red. The binding of the A1 and A2 bases in the AAUAAA complex is shown in gray. (E) Detailed interactions showing the recognition of the A4 and A5 bases of the AUUAAA PAS (orange) by ZF3 of CPSF30 (green). (F) Recognition of the U3–A6 base pair in the AUUAAA PAS (orange) by WDR33 (blue).

The overall binding mode of the AUUAAA PAS is similar to that of AAUAAA (Fig. 2B). The backbone follows an S shape, and the six bases of the PAS are grouped into three pairs: A1–U2, A4–A5, and U3–A6, interacting with ZF2 of CPSF30, ZF3, and WDR33, respectively (Fig. 2C). However, clear conformational differences are observed for the A1 and U2 nucleotides. The 5′ phosphate of U2 moves by 2.6 Å, and its base is located close to the six-membered ring of the A2 base in AAUAAA (Fig. 2B), which allows the smaller U2 base to interact with ZF2 of CPSF30.

The U2 base of AUUAAA makes two hydrogen bonds with ZF2 of CPSF30, between its N3 amide and the main-chain carbonyl of Leu75 and between its O4 carbonyl and the main-chain amide of Lys77 (Fig. 2D). The former hydrogen bond is unique to U2, while the latter is equivalent to that from the N1 atom of A2 in AAUAAA. The A1 base of AUUAAA is recognized by hydrogen-bonding between its N1 atom to the main-chain amide of Lys69 and its N6 atom to the main-chain carbonyl of Val67, equivalent to the recognition of A1 in AAUAAA. However, this base in AUUAAA moves by ∼1.5 Å compared to that in AAUAAA and CPSF30 in this region (residues 67–73) also moves, which maintains the hydrogen-bonding interactions. As observed with AAUAAA, the A1 base is π-stacked with Phe84 on one face and flanked by Lys69 on the other, while the U2 base is π-stacked with His70 on one face and flanked by Lys77 on the other.

In comparison to A1–U2, the binding mode of A4–A5, U3–A6, and the 3′ flanking phosphate (the 5′ phosphate of the nucleotide C7) in the AUUAAA complex is essentially the same as that in the AAUAAA complex. A4 is recognized by hydrogen bonds to its N1 and N6 atoms with ZF3 of CPSF30 as well as a hydrogen bond between its N7 atom and Lys69 of ZF2 (Fig. 2E), the same side chain that flanks the A1 base (Fig. 2D), while A5 has one hydrogen bond. The side chain of Asn107 is located near the A5 N6 atom, but does not have good geometry for a hydrogen bond. The ribose and base of A5 also has interactions with residues 47–49 of WDR33 (Fig. 2E), in the amino-terminal extension to the WD40 domain.

The U3–A6 Hoogsteen base pair interacts with WDR33. The A6 base is π-stacked on one face by Phe153, from the WD40 domain, and on the other face by Phe43, in the amino-terminal extension to the WD40 domain (Fig. 2F). Lys117 from the WD40 domain also flanks this face of A6. The U3 base is flanked by the amide bond between Phe43 and Asp44 on one face and Ile156 on the other.

The 5′ phosphate of the nucleotide (C7) immediately following AUUAAA is recognized by Arg49 and Arg54 in the amino-terminal extension of the WD40 domain of WDR33 (Fig. 2F), consistent with its importance for high affinity binding (Hamilton et al. 2019).

The AUUAAA complex studied here was prepared in a buffer containing 150 mM NaCl, while the AAUAAA complex that we studied earlier was in a buffer containing 350 mM NaCl (Sun et al. 2018). It is unlikely that the observed structural differences are due to the difference in salt concentration. This is supported by the fact that nucleotides U3, A4, A5, and A6 have essentially the same binding modes in the two structures. In addition, the AAUAAA complex studied by another group (Clerici et al. 2018) was in a buffer containing 140 mM KCl, and the same RNA binding mode was observed.

Studies of mPSF in the presence of AAGAAA and GAUAAA

We next aimed to characterize the binding of RNAs with rare PAS hexamers to mPSF by cryo-EM, following the same protocol as that used for the AAUAAA and AUUAAA RNAs. Earlier studies have determined Kd values of 16 nM (Hamilton et al. 2019) and 120 nM (Clerici et al. 2017) for fluorescently labeled AAGAAA RNA binding to mPSF. Binding was detected in a competitive binding assay for GAUAAA but the Kd was estimated at >500 nM, while two other PAS hexamers studied, AAUGAA and AAUACA, showed essentially no binding in this assay (Hamilton et al. 2019). Therefore, we selected AAGAAA and GAUAAA for the structural studies, using oligos with the sequence FAM-AACCUCCAAGAAACAAC and CGAUAAACAAC, respectively. The oligo RNA was present at roughly 160× and 5× over its reported dissociation constant in the sample for structural studies. Cryo-EM reconstructions were obtained at 2.68 and 3.1 Å resolution for mPSF in the presence of AAGAAA (Fig. 3A) and GAUAAA (Fig. 3B) RNA, respectively (Table 1).

FIGURE 3.

FIGURE 3.

Overall structure of the human CPSF160–WDR33–CPSF30 ternary complex. (A) Local resolution map for the sample containing AAGAAA RNA. The lack of RNA binding is visible by comparing the lower right corner with Figure 1C. (B) Local resolution map for the sample containing GAUAAA RNA. (C) Overlay of the structure for the AAGAAA sample (gray, lacking RNA) with that of the AUUAAA quaternary complex (in color). (D) Disordering of ZF2 and ZF3 of CPSF30 and residues 42–54 in the amino-terminal extension to the WD40 domain of WDR33 in the AAGAAA sample (in gray) compared to the AUUAAA quaternary complex (in color). Panels A and B were produced with Chimera.

Neither EM map showed density for the RNA oligo (Fig. 3A,B), despite multiple rounds of classifications of both cryo-EM data sets trying to identify particles that contain RNA. The EM density for the GAUAAA sample is highly similar to that for the AAGAAA sample, and it was not pursued further due to its lower resolution. An atomic model was produced for the ordered residues in the EM map from the AAGAAA sample (Table 1).

The overall structure of the CPSF160–WDR33–CPSF30 ternary complex for the AAGAAA sample is similar to that of AUUAAA complex (rms distance of 0.32 Å for 1495 Cα atoms) (Fig. 3C). Possibly as a result of the absence of RNA, residues 42–54 of WDR33, which π-stacks the U3–A6 base pair and flanks the RNA, and ZF2–ZF3 of CPSF30 are disordered (Fig. 3D). On the other hand, the amino-terminal segment and ZF1 of CPSF30 are ordered, as they mediate the interaction with CPSF160. The structure is also similar to that of mPSF in complex with the mPSF interaction motif (PIM) of CPSF100 (rms distance of 0.57 Å for 1509 Cα atoms) (Zhang et al. 2020), which also lacks RNA and equivalent residues in CPSF30 and WDR33 are disordered. The disordering of ZF2-ZF3 in the absence of RNA was also observed in the yeast complex (Casanal et al. 2017).

DISCUSSION

Our studies have revealed the molecular basis for the recognition of the AUUAAA PAS, the second most frequently observed PAS in mammals. While the overall binding mode of AUUAAA is similar to that of AAUAAA, conformational changes for the A1 and U2 nucleotides and ZF2 of CPSF30 are observed, which allows the smaller U2 base to interact with ZF2. Remarkably, the U2 base in AUUAAA makes two good hydrogen bonds with ZF2, while the A2 base in AAUAAA makes only one. On the other hand, RNA containing the AUUAAA PAS has ∼6× lower affinity for mPSF compared to AAUAAA (Hamilton et al. 2019), which corresponds to ∼1 kcal/mol difference in binding energy. The changes in the position of CPSF30 ZF2 and in the backbone positions of A1 and U2 may have a negative effect on the overall interactions between mPSF and the AUUAAA PAS RNA. Moreover, the conformational changes could have an impact on the binding kinetics, affecting the on and/or off rates of the RNA. Further studies will be needed to carefully dissect the contributing factors to the observed differences in binding affinity.

We were not able to determine the binding mode of two RNAs with rare PAS hexamers, AAGAAA and GAUAAA. Fluorophore-labeled AAGAAA RNAs were observed to have good affinity for mPSF in binding studies (Clerici et al. 2017; Hamilton et al. 2019). It is likely that the fluorophore enhanced the observed affinity, while the inherent affinity of AAGAAA RNA for mPSF is lower. This is supported by our observation that an unlabeled AAUAAA RNA has ∼10× higher Kd value for mPSF compared to a labeled one (Hamilton et al. 2019). Although our cryo-EM studies used the same FAM-labeled AAGAAA RNA as that in the binding assays, we failed to observe the RNA or the fluorophore. On the other hand, the cryo-EM data do not necessarily rule out the binding of these two RNAs to mPSF. It may be possible that complexes with such RNAs of lower affinity are more prone to disruption during the freezing process to prepare the cryo-EM grids.

How mPSF recognizes the rare PAS hexamers remains to be determined. The Hoogsteen base pair observed for AAUAAA and AUUAAA hexamers is unlikely to form for AAGAAA, and conformational changes in the RNA and consequently mPSF would be necessary. Our structural observations with AUUAAA suggest there is substantial plasticity in both the RNA and the protein factors for recognizing the PAS. It may also be possible that in the entire 3′-end processing machinery interactions with additional sequence elements in the pre-mRNA by other protein factors can facilitate and stabilize the interactions between mPSF and these rare PAS hexamers. This is consistent with the importance of some of these protein factors (for example CstF64 and CFIm25) in alternative polyadenylation (Takagaki et al. 1996; Gruber et al. 2012; Martin et al. 2012; Yao et al. 2013; Li et al. 2015; Tian and Manley 2017).

MATERIALS AND METHODS

Protein expression and purification

Human CPSF160 (full-length) and WDR33 (residues 1–572) were coexpressed in Tni insect cells (Expression Systems) and purified as described earlier (Sun et al. 2018). Full-length human CPSF30 (isoform 2) and Fip1 (residues 1–243) were cloned into a pFL vector for coexpression in insect cells. CPSF30 carried a carboxy-terminal 6×-His tag. Bacmids were produced by transforming the plasmid into DH10embacy competent cells. Sf9 insect cells were transfected with purified bacmids to produce P0 virus. P1 virus for large-scale protein expression was amplified from P0 virus in 50 mL of Sf9 cells. One liter of Hi5 cells were inoculated with 10 mL of P1 virus and incubated at 27°C with constant shaking for 48 h. Cells were pelleted by centrifugation, frozen with liquid nitrogen, and stored at −80°C.

Cell pellets were thawed in a 37°C water bath and resuspended with 100 mL lysis buffer (25 mM Tris [pH 8.0], 300 mM NaCl, and one protease inhibitor tablet [Sigma]) and lysed by sonication. Cell lysate was clarified by centrifugation at 13,000 RPM for 45 min, at 4°C. The supernatant was incubated with nickel agarose beads (Qiagen) for 1 h while nutating at 4°C. The beads were washed with 50 bed volumes of wash buffer (25 mM Tris [pH 8.0], 300 mM NaCl, 20 mM imidazole) and eluted with elution buffer (25 mM Tris [pH 8.0], 300 mM NaCl, 250 mM imidazole). The protein was further purified by size exclusion chromatography (Superdex 200, Cytiva) with buffer containing 25 mM Tris (pH 8.0), 300 mM NaCl, and 5 mM DTT. Fractions corresponding to the expected peak were pooled and concentrated to 1 mg/mL. Sample purity was determined by SDS PAGE and Coomassie staining.

Purified human CPSF160–WDR33 and CPSF30–Fip1 were mixed at 0.5:0.6 nmole ratio and incubated on ice for 30 min. The complex was then purified using size exclusion chromatography (Superose 6, Cytiva) with a running buffer containing 25 mM Tris (pH 8.0), 150 mM NaCl, and 5 mM DTT. The fractions containing the quaternary complex were pooled and concentrated to ∼0.5 mg/mL. The complex was diluted to 0.2 mg/mL (0.7 µM) in buffer containing 2.5 µM of RNA (AUUAAA, AAGAAA, or GAUAAA). The sample was incubated on ice for 30 min before being loaded onto EM grids.

RNA oligo sequences

The sequences of the RNA oligoes used in this study are: CAUUAAACAAC, FAM-AACCUCCAAGAAACAAC, CGAUAAACAAC, with the PAS underlined. These oligos were also used in the earlier biophysical studies (Hamilton et al. 2019).

EM data collection and processing

EM grid preparation and data collection were performed as described earlier (Sun et al. 2018). EM grids were screened on a Glacios or TF20 microscope at the New York Structure Biology Center, and EM movies for structure determination were collected on a Titan Krios electron microscope with a K3 detector at the Columbia University Cryo-Electron Microscopy Center. Over 4000 image stacks were collected for each sample (AUUAAA: 4700, AAGAAA: 5346, GAUAAA: 4300 image stacks, Table 1).

Cryo-EM data sets were motion-corrected and dose weighted using RELION (Zivanov et al. 2018). All subsequent data processing and 3D reconstruction were carried out in cryoSPARC (Punjani et al. 2017), following a similar protocol (Supplemental Fig. S1A).

The particle templates and initial 3D volumes were generated using the AAGAAA data set. Particles were selected with the blob picker for 200 micrographs, and templates were selected after 2D classification. Particles were then selected using the template picker for 2800 micrographs, and good particles after 2D classification were used to generate six volumes by ab initio reconstruction.

For the AUUAAA data set, 4,332,000 particles were selected using the template picker, with templates from the AAGAAA data set, and 3,851,000 particles were extracted from the micrographs with 256 × 256 boxes, Fourier cropped to 128 × 128 pixels. Heterogeneous refinement was used to classify all the particles (without 2D classification) into six classes in 3D, using the volumes from the AAGAAA ab initio reconstruction, and 1,358,000 particles were saved. These particles were reextracted from micrographs in 256 × 256 boxes, unbinned, and subjected to another round of heterogeneous refinement. The 1,052,000 particles then underwent another round of heterogeneous refinement, this time into two classes starting with the density of the complex. After the classification, one class had weak density for CPSF30 and the RNA and was discarded. The 858,000 particles in the other class were submitted to a homogenous refinement, resulting in a density map at 2.53 Å resolution (Table 1; Supplemental Fig. S1B). The AAGAAA and GAUAAA data sets were processed in a similar manner and resulted in density maps of 2.68 and 3.12 Å resolution, respectively (Table 1; Supplemental Fig. S1C,D).

Model building and refinement

The structure of the AAUAAA complex (PDB 6DNH) (Sun et al. 2018) was used as the starting model and docked into the cryo-EM density map of AUUAAA complex. The model was manually modified in Coot (Emsley and Cowtan 2004) and refined in PHENIX using real-space refinement (Table 1; Adams et al. 2002).

The cryo-EM map of the AAGAAA data set was interpreted similarly. The ZF2 and ZF3 of CPSF30 were disordered and the RNA was not observed. They are not included in the atomic model (Table 1).

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article.

Supplementary Material

Supplemental Material

ACKNOWLEDGMENTS

We thank Keith Hamilton for providing the RNA oligos. Some of this work was performed at the Columbia University Cryo-Electron Microscopy Center. We thank Zhening Zhang and Robert Grassucci for help with cryo-EM data collection. This research is supported by National Institutes of Health (NIH) grant R35GM118093 (to L.T.).

Author contributions: P.A.G. produced CPSF30-Fip1 and the mPSF-RNA complexes for cryo-EM and analyzed the EM data. J.W. prepared the EM grids and helped with the cryo-EM data collection. Y.S. produced the CPSF160–WDR33 protein sample. L.T. designed the experiments, analyzed the EM data and supervised the research. P.A.G. and L.T. wrote the paper, and all authors commented on the paper.

Footnotes

REFERENCES

  1. Adams PD, Grosse-Kunstleve RW, Hung L-W, Ioerger TR, McCoy AJ, Moriarty NW, Read RJ, Sacchettini JC, Sauter NK, Terwilliger TC. 2002. PHENIX: building a new software for automated crystallographic structure determination. Acta Crystallogr D Biol Crystallogr 58: 1948–1954. 10.1107/S0907444902016657 [DOI] [PubMed] [Google Scholar]
  2. Beaudoing E, Freier S, Wyatt JR, Claverie JM, Gautheret D. 2000. Patterns of variant polyadenylation signal usage in human genes. Genome Res 10: 1001–1010. 10.1101/gr.10.7.1001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Casanal A, Kumar A, Hill CH, Easter AD, Emsley P, Degliesposti G, Gordiyenko Y, Santhanam B, Wolf J, Wiederhold K, et al. 2017. Architecture of eukaryotic mRNA 3'-end processing machinery. Science 358: 1056–1059. 10.1126/science.aao6535 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chan SL, Huppertz I, Yao C, Weng L, Moresco JJ, Yates JR III, Ule J, Manley JL, Shi Y. 2014. CPSF30 and Wdr33 directly bind to AAUAAA in mammalian mRNA 3′ processing. Genes Dev 28: 2370–2380. 10.1101/gad.250993.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Clerici M, Faini M, Aebersold R, Jinek M. 2017. Structural insights into the assembly and polyA signal recognition mechanism of the human CPSF complex. eLife 6: e33111. 10.7554/eLife.33111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Clerici M, Faini M, Muckenfuss LM, Aebersold R, Jinek M. 2018. Structural basis of AAUAAA polyadenylation signal recognition by the human CPSF complex. Nat Struct Mol Biol 25: 135–138. 10.1038/s41594-017-0020-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Edwalds-Gilbert G, Veraldi KL, Milcarek C. 1997. Alternative poly(A) site selection in complex transcription units: means to an end? Nucleic Acids Res 25: 2547–2561. 10.1093/nar/25.13.2547 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Emsley P, Cowtan KD. 2004. Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr 60: 2126–2132. 10.1107/S0907444904019158 [DOI] [PubMed] [Google Scholar]
  9. Fitzgerald M, Shenk T. 1981. The sequence 5′-AAUAAA-3′ forms parts of the recognition site for polyadenylation of late SV40 mRNAs. Cell 24: 251–260. 10.1016/0092-8674(81)90521-3 [DOI] [PubMed] [Google Scholar]
  10. Gautheret D, Poirot O, Lopez F, Audic S, Claverie JM. 1998. Alternate polyadenylation in human mRNAs: a large-scale analysis by EST clustering. Genome Res 8: 524–530. 10.1101/gr.8.5.524 [DOI] [PubMed] [Google Scholar]
  11. Gilmartin GM, McDevitt MA, Nevins JR. 1988. Multiple factors are required for specific RNA cleavage at a poly(A) addition site. Genes Dev 2: 578–587. 10.1101/gad.2.5.578 [DOI] [PubMed] [Google Scholar]
  12. Gruber AR, Martin G, Keller W, Zavolan M. 2012. Cleavage factor Im is a key regulator of 3′ UTR length. RNA Biol 9: 1405–1412. 10.4161/rna.22570 [DOI] [PubMed] [Google Scholar]
  13. Gruber AJ, Schmidt R, Gruber AR, Martin G, Ghosh S, Belmadani M, Keller W, Zavolan M. 2016. A comprehensive analysis of 3′ end sequencing data sets reveals novel polyadenylation signals and the repressive role of heterogeneous ribonucleoprotein C on cleavage and polyadenylation. Genome Res 26: 1145–1159. 10.1101/gr.202432.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hamilton K, Sun Y, Tong L. 2019. Biophysical characterizations of the recognition of the AAUAAA polyadenylation signal. RNA 25: 1673–1680. 10.1261/rna.070870.119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Li W, You B, Hoque M, Zheng D, Luo W, Ji Z, Park JY, Gunderson SI, Kalsotra A, Manley JL, et al. 2015. Systematic profiling of poly(A)+ transcripts modulated by core 3′ end processing and splicing factors reveals regulatory rules of alternative cleavage and polyadenylation. PLoS Genet 11: e1005166. 10.1371/journal.pgen.1005166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Martin G, Gruber AR, Keller W, Zavolan M. 2012. Genome-wide analysis of pre-mRNA 3′ end processing reveals a decisive role of human cleavage factor I in the regulation of 3′ UTR length. Cell Rep 1: 753–763. 10.1016/j.celrep.2012.05.003 [DOI] [PubMed] [Google Scholar]
  17. Murthy KGK, Manley JL. 1992. Characterization of the multisubunit cleavage-polyadenylation specificity factor from calf thymus. J Biol Chem 267: 14804–14811. 10.1016/S0021-9258(18)42111-4 [DOI] [PubMed] [Google Scholar]
  18. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. 2004. UCSF Chimera-a visualization system for exploratory research and analysis. J Comput Chem 25: 1605–1612. 10.1002/jcc.20084 [DOI] [PubMed] [Google Scholar]
  19. Proudfoot NJ. 2011. Ending the message: poly(A) signals then and now. Genes Dev 25: 1770–1782. 10.1101/gad.17268411 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Proudfoot NJ, Brownlee GG. 1974. Sequence at the 3′ end of globin mRNA shows homology with immunoglobulin light chain mRNA. Nature 252: 359–362. 10.1038/252359a0 [DOI] [PubMed] [Google Scholar]
  21. Punjani A, Rubinstein JL, Fleet DJ, Brubaker MA. 2017. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat Methods 14: 290–296. 10.1038/nmeth.4169 [DOI] [PubMed] [Google Scholar]
  22. Schonemann L, Kuhn U, Martin G, Schafer P, Gruber AR, Keller W, Zavolan M, Wahle E. 2014. Reconstitution of CPSF active in polyadenylation: recognition of the polyadenylation signal by WDR33. Genes Dev 28: 2381–2393. 10.1101/gad.250985.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Shi Y, Manley JL. 2015. The end of the message: multiple protein-RNA interactions define the mRNA polyadenylation site. Genes Dev 29: 889–897. 10.1101/gad.261974.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Sun Y, Zhang Y, Hamilton K, Manley JL, Shi Y, Walz T, Tong L. 2018. Molecular basis for the recognition of the human AAUAAA polyadenylation signal. Proc Natl Acad Sci 115: E1419–E1428. 10.1073/pnas.1718723115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Sun Y, Hamilton K, Tong L. 2020. Recent molecular insights into canonical pre-mRNA 3'-end processing. Transcription 11: 83–96. 10.1080/21541264.2020.1777047 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Takagaki Y, Seipelt RL, Peterson ML, Manley JL. 1996. The polyadenylation factor CstF-64 regulates alternative processing of IgM heavy chain pre-mRNA during B cell differentiation. Cell 87: 941–952. 10.1016/S0092-8674(00)82000-0 [DOI] [PubMed] [Google Scholar]
  27. Tian B, Manley JL. 2017. Alternative polyadenylation of mRNA precursors. Nat Rev Mol Cell Biol 18: 18–30. 10.1038/nrm.2016.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Tian B, Hu J, Zhang H, Lutz CS. 2005. A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acid Res 33: 201–212. 10.1093/nar/gki158 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Wickens M, Stephenson P. 1984. Role of the conserved AAUAAA sequence: four AAUAAA point mutations prevent messenger RNA 3′ end formation. Science 226: 1045–1051. 10.1126/science.6208611 [DOI] [PubMed] [Google Scholar]
  30. Yang Q, Doublie S. 2011. Structural biology of poly(A) site definition. Wiley Interdiscip Rev RNA 2: 732–747. 10.1002/wrna.88 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Yao C, Choi EA, Weng L, Xie X, Wan J, Xing Y, Moresco JJ, Tu PG, Yates JR III, Shi Y. 2013. Overlapping and distinct functions of CstF64 and CstF64tau in mammalian mRNA 3′ processing. RNA 19: 1781–1790. 10.1261/rna.042317.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Zhang Y, Sun Y, Shi Y, Walz T, Tong L. 2020. Structural insights into the human pre-mRNA 3′-end processing machinery. Mol Cell 77: 800–809. 10.1016/j.molcel.2019.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Zivanov J, Nakane T, Forsberg BO, Kimanius D, Hagen WJH, Lindahl E, Scheres SH. 2018. New tools for automated high-resolution cryo-EM structure determination in RELION-3. eLife 7: e42166. 10.7554/eLife.42166 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

Articles from RNA are provided here courtesy of The RNA Society

RESOURCES