Abstract
Cleavage and polyadenylation specificity factor 30 (CPSF30) is a zinc finger protein that regulates pre-mRNA processing. CPSF30 contains five CCCH and one CCHC domain and recognizes two conserved 3’ pre-mRNA sequences: an AU-hexamer and a U-rich motif. AU-hexamer motifs are common in pre-mRNAs and are typically defined as AAUAAA. Variations within the AAUAAA hexamer occur in certain pre-mRNAs and can affect polyadenylation efficiency or be linked to diseases. The effects of disease related variations on CPSF30/pre-mRNA binding were determined using a construct of CPSF30 that contains just the five CCCH domains (CPSF30-5F). Bioinformatics was utilized to identify the variability within the AU hexamer sequence in pre-mRNAs. The effects of this sequence variability on CPSF30-5F/RNA binding affinities were measured. Bases at positions 1, 2, 4 and 5 within the AU hexamer were found to be important for RNA binding. Bioinformatics revealed that the three bases flanking the AU hexamer at the 5’ and 3’ ends are twice as likely to be adenine or uracil as guanine and cytosine. The presence of A and U residues in these flanking regions was determined to promote higher affinity CPSF30-5F/RNA binding than G and C residues. The addition of the zinc knuckle domain to CPSF30-5F (CPSF30-FL) restored binding to AU-hexamer variants. This binding restoration is connected to the presence of a U-rich sequence within the pre-mRNA to which the zinc knuckle binds. A mechanism of differential RNA binding by CPSF30, modulated by accessibility of the two RNA binding sites is proposed.
Graphical Abstract
INTRODUCTION
In eukaryotes, zinc finger proteins (ZFs) play important roles in transcription, translation and regulation. It is estimated that 5% of all human proteins are ZFs, and these proteins share the common feature of having modular domains with a combination of cysteine and histidine residues that function as ligands to bind zinc.1 Once zinc is bound, ZFs adopt distinct secondary structures that facilitate RNA, DNA and protein binding.2-9 ZFs are classed based upon their domain identity and architecture, and a class that has received considerable attention in recent years is the CCCH class.3, 10, 11 The CCCH ZF motif has been identified in the sequence of numerous proteins with roles in RNA regulation including RNA metabolism, splicing, decay, and translation.2, 3, 10, 11 CCCH type ZFs typically have between 1-6 domains, and there is emerging evidence that ZFs with CCCH domains preferentially recognize AU rich RNA target sequences.3, 10-15
One member of the CCCH class of ZFs is cleavage and polyadenylation specificity factor 30 (CPSF30).16 CPSF30 contains five CCCH ZF domains and a singular CCHC or zinc knuckle domain. CPSF30 is part of a complex of proteins called CPSF that regulate pre-mRNA processing.13, 17-21 During pre-mRNA processing, the CPSF complex facilitates recognition of the polyadenylation sequence (PAS), cleavage of the 3’ end, recruitment of poly(A) polymerase, and the subsequent addition of a poly adenosine tail, also known as polyadenylation (Figure 1A). Studies utilizing nuclear extracts that contain the CPSF proteins, the purified CPSF complex, and in vitro translated CPSF proteins have identified some of the protein-protein and protein-RNA interactions within the CPSF complex.16, 22-24 The individual roles of the isolated CPSF proteins are not well understood. Our laboratory has previously reported the isolation of full length CPSF30 (CPSF30-FL) and a construct that contains just the five CCCH domains (CPSF30-5F).13, 25 ZFs with CCCH domains often recognize AU-rich RNA target sequences.3, 13, 26 AU-rich sequences are present in most pre-mRNAs, therefore we investigated whether isolated CPSF30 binds to AU-rich RNA sequences. We reported that both CPSF30-FL and CPSF30-5F bind to the AU rich sequence present within alpha-synuclein pre-mRNA with high affinity in a cooperative manner.13, 25 These results identified a role for CPSF30 in direct RNA binding.13, 25, 27 The AU-rich sequence that CPSF30 binds to in pre-mRNA has the motif AAUAAA. This sequence is called the AU hexamer and is centrally located in the polyadenylation signal (PAS).19 The AU hexamer is present in the majority of pre-mRNAs, suggesting that this is a general target for CPSF30 (Figure 1B).
Figure 1.
(A) Schematic representation of polyadenylation site recognition and processing. The upstream element or USE is most commonly described as U-rich, the cleavage site most commonly occurs at a cytosine-adenosine dinucleotide sequence located approximately 15-30 nt downstream from the central AU hexamer (PAS), and the downstream element (DSE) is most commonly U or G/U rich. (B) Polyadenylation site occurrence based on mRNA expression frequencies determined in this study.
In our work to identify the binding partner of CPSF30, we made the discovery that one of the CCCH domains has a 2Fe-2S cluster co-factor in lieu of zinc, with the remaining 4 CCCH domains binding zinc.13 This co-factor must be present for optimal RNA binding by CPSF30, in addition to zinc, indicating a functional role for both metals.13 We also reported that both CPSF30-FL and CPSF30-5F (which only contains the five CCCH domains) bind to the AU hexamer with similar affinities, suggesting that the CCCH domains are important for RNA recognition.25 Although there are not yet any structures of CPSF30 bound to RNA alone, two cryo-EM structures of the CPSF complex provide additional support for the role of the CCCH domains in binding to the AU hexamer.28, 29 In these structures, only ZF1-3 are visible with ZF2 and ZF3 directly binding to a short fragment of RNA (10 and 17 mer) at adenosines 1/2 and 4/5 of the AAUAAA sequence.28, 29 Notably, ZF2 and ZF3 of CPSF30 are not sufficient to bind to RNA in isolation. We reported that a construct of CPSF30 containing just ZF2 and ZF3 does not bind to the AU hexamer pre-mRNA sequence.25
CPSF30 also binds to poly uracil RNA.16, 25 This binding is only observed when the CCHC ‘zinc knuckle’ domain is present. CPSF30-5F, which contains just the 5 CCCH domains does not bind poly uracil RNA, indicating that the zinc knuckle present in CPSF30-FL is involved in direct binding.25 Taken together, these biochemical results reveal that CPSF30 plays a role in pre-mRNA processing by binding to specific pre-mRNA sequences facilitating the transition from pre-mRNA to mRNA.
There are a number of diseases linked to aberrant polyadenylation.20, 30-37 Of these diseases, at least 5 are associated with mutated PAS hexamer sequences within their pre-mRNA sequences. This led us to hypothesize that variants within the PAS hexamer would result in abrogation of CPSF30 binding to pre-mRNA. To test this hypothesis, we utilized fluorescence anisotropy (FA), to determine the effects of these variants on CPSF30/RNA binding. Herein, we report that certain variants have a profound effect on CPSF30/RNA binding, while others do not. We link these findings to specific hydrogen bonding interactions between amino acid residues on CPSF30 and its partner RNA. We expand these studies to interrogate the general role of the AU hexamer in CPSF30/RNA binding, using both experimental and bioinformatics approaches. Studies to understand how CPSF30 discriminates between the AU hexamer and poly Uracil RNA sequences are also described and a model for CPSF30/AU hexamer/PolyU binding is proposed.
MATERIALS AND METHODS
Cloning, expression, and purification of holo-CPSF30-5F and holo-CPSF30-FL.
Cloning, expression and purification followed our published methods.13, 25, 27 Briefly, CPSF30 DNA from the Bos Taurus homolog was obtained from Drs. G. Martin and W. Keller (University of Basel, Basel, Switzerland). CPSF30-5F (UniProtKB O19137) (AA 33-170) and full length (UniProtKB O19137) (AA 1-243) constructs were cloned into the pMAL-c5e plasmid utilizing NdeI and BamHI restriction sites and confirmed at the University of Maryland, Baltimore’s Biopolymer-Genomics core facility. BL21-DE3 cells were transformed via heat shock with either holo-CPSF30-5F or holo-CPSF30-FL plasmids. Transformed cells were incubated at 37°C for 45 minutes. Overnight cultures containing 50 mL of Lennox modified LB broth supplemented with 100 μg/mL of ampicillin was inoculated with 150 μL of transformed cells and allowed to grow overnight at 37°C with shaking. The following day, 10-15 mL of overnight culture were used to inoculate 4 L culture flasks containing 1 L of Lennox modified LB broth supplemented with 100 μg/mL ampicillin and 0.2% (w/v) glucose. Cell cultures were grown at 37°C with shaking to an optical density of approximately 0.3 where flasks were supplemented with 0.8 mM ZnCl2 and 0.6 mM FeCl3. Once cultures reached an optical density between 0.5-0.6, they were supplemented with 0.4 mM Na2S•9H2O and induced with 1 mM IPTG. Flasks were incubated at 37°C for 3 hours with shaking and then centrifuged at 7800 x g for 20 minutes at 4°C and stored at −20°C.
For purification of holo-CPSF30-5F, pellets were re-suspended in 25 mL of buffer (20 mM Tris and 200 mM NaCl at pH 7.5). A protease inhibitor tablet was added to prevent protein degradation. Sonication was utilized for cell lysis and cell lysate was then centrifuged at 17,710 x g for 20 minutes at 4°C. The sonicated supernatant was loaded onto an amylose resin gravity column (15 mL bed volume) and allowed to flow through. The resin was washed with 45 mL of cold buffer (20 mM Tris, 200 mM NaCl, at pH 7.5) four times. Protein was eluted three times with 15 mL of cold elution buffer (20 mM Tris, 200 mM NaCl, 30 mM maltose, at pH 7.5) producing a total of 45 mL of isolated protein. The UV-visible spectrum of each elution was recorded and the protein was concentrated utilizing a 30 kDa molecular weight cut off spin filter. Holo-CPSF30-5F was buffer exchanged for fluorescence anisotropy studies (20mM Tris, 100 mM NaCl, at pH 8). The protein concentration was determined utilizing a calculated extinction coefficient of 85400 M−1 cm−1 at 278 nm and purity was assessed with SDS-PAGE while metal occupancy was determined utilizing ICP-MS. Holo-CPSF30-FL was expressed and purified similarly to holo-CPSF30-5F with the following exceptions. Sonicated supernatant was supplemented with 300 mM NaCl and added to 15 mL of amylose resin. The supernatant and resin were incubated for 15-20 minutes with shaking and then allowed to flow through. The salt concentration was decreased with subsequent washes and protein was eluted utilizing a 20 mM Tris, 200 mM NaCl, and 30 mM maltose buffer at pH 7.5. The calculated extinction coefficient utilized for holo-CPSF30-FL was 88200 M−1 cm−1.
Inductively coupled plasma mass spectrometry (ICP-MS).
Protein metal occupancy was determined utilizing ICP-MS as previously described.25, 27 Protein samples were diluted to 1 μM with 6% nitric acid with a final volume of 2 mL. A mixing T was utilized to run sample and internal standard containing Rh, Bi, Ge, and Sc to the nebulizer. Samples were measured in He mode to prevent interference with argon oxide.
Fluorescence Anisotropy (FA).
An FA binding assay was utilized to measure the affinities of CPSF30 (5F and full length) on RNA variants. RNA oligonucleotides with a 3’ 6-FAM (6-carboxyfluorescein) conjugated fluorescent molecule were purchased from Sigma at HPLC purified grade. ISS K2 and PC-1 spectrofluorometers were configured in the L format and utilized for FA binding studies. FA experiments were conducted by employing an excitation wavelength/slit width of 495 nm/2 mm and emission wavelength/slit width of 517 nm/1 mm. FA experiments were performed in 20 mM Tris, 100 mM NaCl, 0.3 mg/ml Poly(C), and 0.1 mg/ml bovine serum albumin at pH 8 with 5 nM fluorescently labeled RNA in 5 mm Spectrosil far-UV quartz window fluorescence cuvettes (Starna Cells). Either CPSF30-FL or CPSF30-5F were titrated into the cuvette until saturation. The anisotropy and total fluorescence intensity were then recorded. Raw anisotropy values were volume corrected and then corrected for changes in fluorophore quantum yield utilizing the following equation:
Where rc is the corrected anisotropy, r0 is the anisotropy of the free fluorescently labeled RNA, rb is the anisotropy of the protein-RNA complex at saturation and r is the raw anisotropy.
Data were then analyzed utilizing Graphpad Prism 5 fit to a cooperative binding model as follows:
Where, h is the Hill coefficient, rtc is the total corrected anisotropy, [P] is the protein concentration and [P]1/2 is the protein concentration at half the saturation value. All titrations were performed in triplicate or higher, and conducted in tandem with a positive control of either α-syn24 or α-syn30 to confirm protein/RNA binding activity.
Fluorescence anisotropy competition assays.
A forward titration of CPSF30-FL protein to fluorescently labeled RNA was conducted as described in the “Fluorescence anisotropy (FA)” methods section to determine protein concentrations needed to reach 75% complex formation. Competition assays were performed by first adding CPSF30-FL protein to achieve a starting anisotropy value equal to approximately 75% complex formation (301 nM protein for α-syn24 complex and 198 nM for polyU complex). Unlabeled RNA was titrated into the cuvette until anisotropy values reached saturation equal to free fluorescently labeled RNA.
RNA oligonucleotides.
All RNA oligonucleotides were purchased desalted and HPLC purified and their sequences are listed below in Table 1.
Table 1.
RNA oligomers utilized in this study. The polyadenylation sequence is underlined and PAS variants are shown in red
RNA Name | RNA Sequence |
---|---|
ASYN-24 | CACUUUAAUAAUAAAAAUCAUGCU |
ASYN-30 | UCUCACUUUAAUAAUAAAAAUCAUGCUUAU |
α-thalassemia | UCUCACUUUAAUAAUAAGAAUCAUGCUUAU |
β-thalassemia | UCUCACUUUAAUAACAAAAAUCAUGCUUAU |
IPEX | UCUCACUUUAAUAAUGAAAAUCAUGCUUAU |
LUPUS | UCUCACUUUAAUAAUAGAAAUCAUGCUUAU |
FABRY | UCUCACUUUAAUAUUAAGAAUCAUGCUUAU |
P2AtoG | UCUCACUUUAAUAGUAAAAAUCAUGCUUAU |
P2AtoU | UCUCACUUUAAUAUUAAAAAUCAUGCUUAU |
P4AtoC | UCUCACUUUAAUAAUCAAAAUCAUGCUUAU |
P5AtoC | UCUCACUUUAAUAAUACAAAUCAUGCUUAU |
P6AtoC | UCUCACUUUAAUAAUAACAAUCAUGCUUAU |
Most freq fl | UCUCACUUUAAAAAUAAAAAACAUGCUUAU |
Least freq fl | UCUCACUUUUCGAAUAAAUCGCAUGCUUAU |
ARE-16 | AUUAUUUAUUUAUUUA |
ARE-18 | UUAUUAUUUAUUUAUUUA |
ARE-20 | UUUAUUAUUUAUUUAUUUAG |
ARE-24 | AUUUAUUUAUUAUUUAUUUAUUUA |
ARE-30 | UUUAUUAUUUAUUUAUUAUUUAUUUAUUUA |
ARE-38 | GUGAUUAUUUAUUAUUUAUUUAUUAUUUAUUUAUUUAG |
PAS hexamer bioinformatics.
Polyadenylation signal sequences of Homo sapiens were obtained from PolyASite Atlas V2.0 (GRCH38.96).38 Up to 432,444 clusters containing one of the following poly(A) signals residing in a region of 60 nucleotides upstream to 10 nucleotides downstream of a representative poly(A) site of the cluster were included in the analysis: AAUAAA, AUUAAA, UAUAAA, AGUAAA, AAUACA, CAUAAA, AAUAUA, GAUAAA, AAUGAA, AAUAAU, AAGAAA, ACUAAA, AAUAGA, AUUACA, AACAAA, AUUAUA, AACAAG and AAUAAG, following the same criteria as Herrmann et al.38 The frequencies of each hexamer were calculated in three forms of analysis; (1) site based frequency (2) cluster based frequency and (3) mRNA expression weighted frequency determined as follows:
The data were further analyzed by a sequence logo algorithm to extract motif pattern information.39
PAS hexamer flanking region bioinformatics.
Messenger RNA sequences of Homo sapiens were obtained from the NCBI reference sequence database. 3’ end sequences of transcripts were parsed by regular expression, and 5’ and 3’ end flanking regions of the canonical poly(A) signal (AAUAAA) were extracted.40 Frequencies of each nucleotide at a position in the immediate 5’ and 3’ flanking sequences were calculated by counting its occurrences at each up- and downstream position in all transcripts carrying the conserved AAUAAA hexamer.
RESULTS AND DISCUSSION
Disease related PAS variants abrogate CPSF30-5F/RNA binding.
The region of pre-mRNA that CPSF30 binds to is called the polyadenylation signal (PAS) (Figure 1A). Within the PAS signal, there is a conserved sequence called the central hexamer. This sequence is most commonly, AAUAAA, and our laboratory has shown that CPSF30 alone binds to this sequence with high affinity in a cooperative manner, when CPSF30 is loaded with both Zn and Fe (2Fe-2S cluster).13 Similarly, high affinity binding of CPSF30 to the AU hexanucleotide motif has been reported when the protein is in complex with other CPSF proteins (CPSF160-WDR33-FIP-1-CPSF30).23, 24 CPSF30 binding to the PAS signal is known to be a key step in the maturation of eukaryotic mRNAs in cells, and these studies of CPSF30 at the molecular level underscore the significance of the AU hexamer sequence. There have been several reports of variants within the AU-rich hexanucleotide sequence in diseases that involve improper 3’ end processing.20, 30-37 These include α and β thalassemia - blood disorders that result in absent or lowered concentrations of α or β globin chains;31, 32 IPEX (polyendocrinopathy, enteropathy, X-linked inheritance) syndrome - a disease that stems from a PAS variant in the FOXP3 gene leading to immunodeficiencies;33 Fabry disease - which results from a disruption in glycosphingolipid catabolism due to a PAS variant in the α-galactosidase A gene;34 and Lupus - an autoimmune disease that involves a PAS variant in the GIMAP5 gene.37 We sought to determine if these disease states are manifest by abrogation of binding between the CCCH domains of CPSF30 and the mutated PAS AU hexanucleotide RNA. Utilizing a construct of CPSF30 that contains just the 5 CCCH domains (CPSF30-5F), that we previously reported to selectively recognize the AU-hexamer when loaded with Zn and a 2Fe-2S cluster,13 we measured the affinity of CPSF30-5F for 5 disease related RNA PAS variants (α-thalassemia (ATHA, sequence: AAUAAG), β-thalassemia (BTHA, sequence: AACAAA), IPEX (AAUGAA), Fabry (AUUAAG), and Lupus (AAUAGA)). The experiments performed utilized fluorescence anisotropy (FA), and involved titrating CPSF30-5F with fluorescently labeled RNA molecules that correspond to the PAS variants (Table 2).25 As table 2 and figure 2 show, CPSF30-5F/RNA binding was abrogated for three of the variants: IPEX, Fabry and Lupus. In contrast, binding was retained for the α and β thalassemia variants and was of the same order of magnitude as that observed for the native PAS signal sequence (Table 2). All of the variants evaluated were single base variants within the AU hexamer sequence, except for the Fabry variant which had two base modifications: positions 2 and 6 (P2AtoU and P6AtoG). Notably, these single variants in positions 2 and 6 alone do not affect CPSF30-5F /RNA binding indicating that variants at both positions within the hexamer must be present to have an effect on binding (Tables 2 and 3). Taken together, we suggest that the loss in binding affinity of CPSF30’s CCCH ZFs to the IPEX, Fabry, and Lupus disease related RNA variants may contribute to the mechanisms of these diseases; whereas, CPSF30 may not play a key role in ATHA and BTHA.
Table 2.
Sequences of the RNA oligonucleotides corresponding to disease variants related to the PAS and the binding constants [P1/2] and Hill coefficients determined from titrations with CPSF30-5F fit to a cooperative binding model (± SD).
Name | Sequence A1A2U3A4A5A6 (5’→ 3’) | Hill coefficient | [P]1/2 (nM) |
---|---|---|---|
α-syn30 | UCUCACUUUAAUAAUAAAAAUCAUGCUUAU | 1.8a | 178 ± 24a |
ATHA | UCUCACUUUAAUAAUAAGAAUCAUGCUUAU | 2.0 | 135 ± 6.3 |
BTHA | UCUCACUUUAAUAACAAAAAUCAUGCUUAU | 2.1 | 148 ± 9.0 |
IPEX | UCUCACUUUAAUAAUGAAAAUCAUGCUUAU | - | n.b |
LUPUS | UCUCACUUUAAUAAUAGAAAUCAUGCUUAU | - | n.b |
FABRY | UCUCACUUUAAUAUUAAGAAUCAUGCUUAU | - | n.b |
From ref 25.
Figure 2.
(A) Cryo-EM structure of CPSF160, WDR33, CPSF30 and RNA, focused on CPSF30 (ZF1,2,3) bound to RNA (sequence 5’-AACCUCCAAUAAACAAC-3’). Panels show the interactions of each nucleotide of the AAUAAA hexamer with amino acids from CPSF30. Hydrogen bonds between bases and amino acids are shown by yellow dotted lines. The figure was generated in Pymol (PDB 6BLL). (B) Oligonucleotide sequences of the PAS hexamers investigated in this study with variants from the canonical AU-hexamer highlighted in red. (C) Fluorescence anisotropy titrations of PAS RNA variant sequences with CPSF30-5F (α-syn30 green squares, ATHA dark blue circles, BTHA purple triangles, IPEX light blue diamonds, LUPUS red circles, FABRY brown inverted triangles). (D) Fluorescence anisotropy titrations of single nucleotide PAS RNA variant sequences with CPSF30-5F (α-syn30 green squares, P2AtoG red circles, P2AtoU purple triangles, P4AtoC inverted orange triangles, P5AtoC dark blue diamonds, P6AtoC grey squares). An average of three titrations is plotted, and the error is shown as the standard error of the mean (SEM). Data are fit to a cooperative binding model.
Table 3.
RNA oligonucleotide sequences of variants within the PAS and the binding constants [P1/2] and Hill coefficients determined from titrations with CPSF30-5F fit to a cooperative binding model (± SD).
Name | Sequence (5’→ 3’) | Hill Coefficient | [P]1/2 (nM) |
---|---|---|---|
P2AtoG | UCUCACUUUAAUAGUAAAAAUCAUGCUUAU | 1.4 | 376 ± 12 |
P2AtoU | UCUCACUUUAAUAUUAAAAAUCAUGCUUAU | 1.2 | 207 ± 0.60 |
P4AtoC | UCUCACUUUAAUAAUCAAAAUCAUGCUUAU | 1.4 | 335 ± 38 |
P5AtoC | UCUCACUUUAAUAAUACAAAUCAUGCUUAU | 1.2 | 278 ± 29 |
P6AtoC | UCUCACUUUAAUAAUAACAAUCAUGCUUAU | 1.4 | 309 ± 26 |
To better understand why the α and β thalassemia variants (ATHA and BTHA) did not exhibit loss in binding to CPSF30-5F, we mapped the mutated ATHA and BTHA RNA sequences onto a recent cryo-EM structure of the CPSF complex. In these structures, CPSF30 is partially visible with ZFs 1, 2 and 3 present. CPSF30’s CCCH ZF domains 2 and 3 interact directly with adenosines 1, 2, 4, and 5 of the hexamer (AAUAAA).28, 29 The ATHA and BTHA variants are in positions 3 and 6 of the hexamer, which are not involved in direct binding with CPSF30 (Figure 2A). Positions 3 and 6 are recognized by WDR33 in the CPSF complex, therefore variants in this region likely affect other partners in the CPSF complex.28, 29
Effects of single PAS variants on CPSF30-5F/RNA binding.
The finding that single nucleotide variations within the AU-hexamer RNA sequence can abrogate binding between CPSF30-5F and RNA led to the question: how important is sequence conservation within the AU hexamer for CPSF30 binding? To address this question, we examined variants in positions 4 and 5 of the AU hexamer (Table 2). These positions were chosen because the disease variants revealed that A to G transitions (purine to purine) in those positions abrogated binding. The cryoEM structure indicates that the amine group on A4 and A5 hydrogen bond to E95 and N107 amino acids in CPSF30, respectively (Figure 2). The variant chosen was an A to C, or purine to pyrimidine change within the hexamer. This variant is predicted to retain hydrogen bonding via an amine group, and we hypothesized that CPSF30-5F/ RNA binding would be retained. This prediction was confirmed by the FA data: CPSF30-5F bound to the PAS RNA sequences with the A to C variations in positions 4 and 5 of the AU hexamer, albeit with slightly weaker affinities as those measured for the native RNA target sequence (Table 3). These results support a role for hydrogen bonding in high affinity binding of CPSF30 to the PAS hexamer.
We also examined a P2AtoU variation. This variant was chosen because AUUAAA is the second most common PAS hexamer at ~10% of human polyA selection sites and we predicted that the variant would not affect CPSF30/PAS RNA binding. The FA data supported this prediction: the AUUAAA sequence exhibited comparable CPSF30 binding as the AAUAAA sequence ([P]1/2 207 ± 0.60 nM) versus 178 ± 24 nM) (Table 3). We note that in nuclear extracts the AUUAAA polyadenylation site exhibits 77% of the processing efficiency, which is defined by Sheets et al as the percentage of polyadenylated product of a mutated PAS as compared to the canonical site AAUAAA, suggesting a potential connection between binding affinity and processing efficiency (Table S1).22 The affinity of CPSF30 for a P2AtoG variant was also examined. Here, the prediction was that the variants would abrogate binding because this same A to G variant in positions 4 and 5 of the hexamer abrogates binding; however, the P2AtoG variants retained RNA binding, although it was weaker than that measured for the native RNA target (Table 3). This suggests that the identity of the base in position two can be variable. On the cellular level, there is a body of literature that has measured polyadenylation efficiency mediated by CPSF30 (as part of a nuclear extract that contains the other CSPF proteins) as a function of RNA sequence.22 In these studies, it has been reported that variants to the AAUAAA hexamer to guanine in positions 3, 4, and 5 significantly attenuate polyadenylation efficiency: <6% the polyadenylation efficiency as AAUAAA; whereas, this same guanine variant in position 2 (P2AtoG transition) only lowers the polyadenylation efficiency to 29% of the efficiency compared to the canonical AAUAAA hexamer. These data suggest that the identity of the base in position two is not as important for function.22 Thus, the finding that the P2AtoG transition does not severely affect CPSF30/RNA binding may be connected to polyadenylation efficiency.
Bioinformatic analysis of PAS hexamers to identify common sequence variations.
The RNA binding studies revealed that CPSF30 will bind to sequences that are modified from the canonical AAUAAA hexamer PAS, and we sought to define the common sequence variations that are present within the PAS hexamer sequence. The PAS is known to have some sequence variability, and there have been several studies that utilized bioinformatics to identify this variability.41-44 Building upon these studies, we applied a bioinformatics approach to identify the PAS site variability in pre-mRNAs from H. sapiens. We utilized the PolyASite 2.0 database, developed by Herrmann et al,38 to identify the most abundant AU-hexamer type sequences. The pre-mRNA sequences that were parsed in our bioinformatics analysis were defined as those between −60 to +10 nucleotides relative to the polyA site.38 In some cases, more than one PAS signal was found in this region. In this case, the most defined site was called the ‘representative’ site, and the remaining signals were called ‘clustered’ as described by Herrmann et al.38 The results were grouped in three manners: (i) site based frequency, (ii) cluster-based frequency and (iii) mRNA expression weighted frequency (Table 4 and SI – XLSX file “PAS hexamer analysis”). Site based frequency is the ratio of each hexamer’s occurrence in all signals found, cluster based frequency is the fraction of clusters containing the hexamer and mRNA expression weighted frequency is the fraction of the sequence present in the total number of mRNAs that are expressed. The sequences identified from our analysis are listed in table 4. Each PAS hexamer sequence is presented with a value assigned based upon the type of grouping. A value of 1 would mean that the sequence is present 100% of the time. The most abundant sequences are colored red and the least abundant sequences are colored green, with gradations between the two colors reflective of the frequency of each sequence. Notably, in all three groupings the most abundant sequence is the canonical hexamer – AAUAAA as predicted. Sequence logos were generated for each grouping, shown at the top of table 4. In these logos, the base at position 2 is the least conserved, which is consistent with the CPSF30/RNA binding data whereby variants at this position have little effect on binding affinity (vide supra).
Table 4.
Heat maps generated from bioinformatic analyses of the most frequent PAS hexamer sequences in H. sapiens based upon site, cluster, and mRNA expression weighted frequencies. A heat map is shown going from most frequent to least frequent (red-yellow-green respectively).
PAS Signal |
Site based frequency | Cluster based frequency |
mRNA expression weighted frequency |
---|---|---|---|
Sequencelogo |
![]() |
![]() |
![]() |
AAUAAA | 0.2977 | 0.5885 | 0.4790 |
AUUAAA | 0.1031 | 0.2039 | 0.1096 |
AAGAAA | 0.0795 | 0.1572 | 0.0433 |
AAUAAU | 0.0591 | 0.1168 | 0.0420 |
AACAAA | 0.0587 | 0.1160 | 0.0375 |
UAUAAA | 0.0581 | 0.1148 | 0.0396 |
AAUAUA | 0.0511 | 0.1010 | 0.0284 |
AAUGAA | 0.0429 | 0.0848 | 0.0412 |
AGUAAA | 0.0360 | 0.0712 | 0.0282 |
AAUACA | 0.0320 | 0.0632 | 0.0121 |
AUUAUA | 0.0309 | 0.0611 | 0.0185 |
CAUAAA | 0.0296 | 0.0586 | 0.0215 |
GAUAAA | 0.0245 | 0.0483 | 0.0143 |
ACUAAA | 0.0234 | 0.0462 | 0.0229 |
AAUAGA | 0.0204 | 0.0403 | 0.0114 |
AAUAAG | 0.0200 | 0.0396 | 0.0127 |
AUUACA | 0.0176 | 0.0348 | 0.0121 |
AACAAG | 0.0153 | 0.0302 | 0.0159 |
Bioinformatics of AU hexamer flanking regions.
The alpha-synuclein pre-mRNA sequence utilized in these studies of CPSF30/RNA binding contains AU sequences at the immediate 3’ and 5’ flanking regions (AAU). Inspection of other pre-mRNA sequences revealed that AU rich sequences often extend beyond the AU-hexamer. This suggests that pre-mRNA sequences have a built-in redundancy. This sequence redundancy could be important for CPSF30/RNA binding or for mitigating any effects of variants to the AU hexamer motif on CPSF30/RNA binding. To determine whether our observation that AU sequences often extend beyond the AU hexamer is a general feature of all pre-mRNAs, we performed a bioinformatic analysis of the 5’ and 3’ flanking regions of the AAUAAA PAS hexamer. 62,391 human gene transcript variants from the NCBI non-redundant database were examined and the frequency with which each of the four bases occurred in each flanking region was determined. As shown in Table 5 and Figure S1, adenosines and uracils occurred on average twice as frequently than cytosines and guanines in the flanking regions. For the 5’ and 3’ flanking regions AAA was the most frequent flanking region and UCG was the least frequent (SI – XLSX file “PAS flanking region bioinformatics”).
Table 5.
(Top) Sequence of the PAS with the nucleotide positions of the 5’ and 3’ flanking regions indicated. (Bottom) Comparison of the frequency with which each nucleotide in the immediate 5’ and 3’ flanking regions of the PAS occurs. Data from a bioinformatic analysis utilizing the NCBI non redundant database.
![]() | |||
5’ Flanking region | |||
P1 | P2 | P3 | |
A | 0.30 | 0.31 | 0.43 |
U | 0.35 | 0.36 | 0.24 |
C | 0.17 | 0.15 | 0.21 |
G | 0.17 | 0.17 | 0.12 |
3’ Flanking region | |||
P1 | P2 | P3 | |
A | 0.39 | 0.32 | 0.27 |
U | 0.23 | 0.32 | 0.37 |
C | 0.15 | 0.16 | 0.18 |
G | 0.23 | 0.19 | 0.17 |
The higher frequency with which the adenine and uracil residues are present in the AU-hexamer flanking regions compared to guanine and cytosine suggests that they may contribute to CPSF30/RNA binding. To test this hypothesis, fluorescence anisotropy (FA) binding assays were conducted for CPSF30-5F with α-syn30 variants in which the flanking regions of α-syn30 were mutated to AAA and UCG. CPSF30-5F bound to the α-syn30 AAA flanking region variant with an affinity of 226 ± 1.6 nM (compared to 178 ± 24 nM for native α-syn30); while the affinity of CPSF30-5F α-syn30 for the UCG flanking region variant was weaker, 309 ± 25 nM (Table 6). These data support a role for flanking region AU sequence in CPSF30/pre-mRNA binding. We propose that the AU-rich flanking sequences may serve as either redundant sequences to ensure PAS recognition or to fine tune the efficiency of one PAS versus another.
Table 6.
RNA oligonucleotide sequences derived from a bioinformatic analysis of the 5’ and 3’ flanking regions of the PAS and their apparent dissociation constants (± SD) to CPSF30-5F.
Name | Sequence (5’→ 3’) | Hill coefficient |
CPSF30-5F [P]1/2 (nM) |
---|---|---|---|
Most freq fl | UCUCACUUUAAAAAUAAAAAACAUGCUUAU | 2.0 | 226 ± 1.6 |
Least freq fl | UCUCACUUUUCGAAUAAAUCGCAUGCUUAU | 1.5 | 309 ± 25 |
CPSF30-5F binds to AU-rich element (ARE) RNA sequences.
CPSF30 belongs to the CCCH class of ZF proteins. The best studied protein in this class is tristetraprolin (TTP) which regulates mRNA transcripts by selectively recognizing and binding to the mRNA sequence UUUAUUUAUUU (also called the ARE), which is present in the 3’ untranslated region of mRNA.15, 45 This ZF/RNA binding event is important for regulating the stability of the mRNA and its ability to be marked for rapid degradation during stress responses 10, 12, 26, 46 Given the high sequence homology between the CCCH ZF domains of CPSF30 and TTP, we predicted that CPSF30 would also bind to the ARE sequence (Figure S2). To test this hypothesis, the affinities of CPSF30-5F for a series of oligonucleotides with the ARE sequence (from TNF-α mRNA) of varied lengths – 16 – 38 nucleotides- were measured. Tight binding, on par with that reported for the AU hexamer was observed for oligonucleotides with a length of 24 or longer (Table 7, Figure S3). These data contrast with our previously obtained data for non-ARE sequences: CPSF30 does not bind to control RNAs including GU-rich, polyU, polyC, Rβ31, and others.13, 25 The finding that both CPSF30 and TTP selectively recognize AU-rich RNA sequences suggests that other ZFs that contain CCCH domains may also preferentially bind AU-rich RNA sequences. Future experiments will test this hypothesis.
Table 7.
RNA oligonucleotide sequences from the ARE region of TNF-α and their relative dissociation constants (± SD) to CPSF30-5F.
Name | Sequence (5’→ 3’) | Hill coefficient |
[P]1/2 (nM) |
---|---|---|---|
ARE-16 | AUUAUUUAUUUAUUUA | 3.6 | 1000 ± 82 |
ARE-18 | UUAUUAUUUAUUUAUUUA | 2.1 | 671 ± 49 |
ARE-20 | UUUAUUAUUUAUUUAUUUAG | 2.1 | 499 ± 37 |
ARE-24 | AUUUAUUUAUUAUUUAUUUAUUUA | 1.8 | 265 ± 31 |
ARE-30 | UUUAUUAUUUAUUUAUUAUUUAUUUAUUUA | 3.0 | 163 ± 10 |
ARE-38 | GUGAUUAUUUAUUAUUUAUUUAUUAUUUAUUUAUUUAG | 2.5 | 141 ± 5.0 |
Addition of the zinc knuckle domain to CPSF30 restores RNA binding.
Full length CPSF30 contains a ‘CCHC’ zinc knuckle domain at the C-terminus, in addition to the five CCCH domains that make up CPSF30-5F. Zinc knuckle domains often recognize U-rich sequences, and we have reported that full length CPSF30 binds to a polyU RNA sequence, in addition to the AU hexamer.25 We have also found that when 4 of 6 residues within the hexamer (CCUCCA) are modified in the context of α-syn, full length CPSF30 still binds to PAS RNA, albeit with weaker affinity. This was attributed to the presence of a polyU sequence outside of the hexamer of α-syn RNA.25 Based upon these results, we proposed that the loss of RNA binding observed here for CPSF30-5F with the PAS variants, would be recovered by addition of the zinc knuckle domain to the protein construct (CPSF30-FL). FA experiments confirmed this hypothesis – as shown in table S2, CPSF30-FL bound to all of the RNA variants. Pre-mRNA sequences contain U-rich motifs at the 3’ end, in addition to the AU hexamer sequence, and the RNA binding observed for full length CPSF30 is likely due to the zinc knuckle domain binding to the U-rich sequence present in alpha-syn RNA.
The finding that full length CPSF30 binds to both an AU hexamer sequence and a polyU sequence brings up the question as to whether one sequence is preferentially recognized over the other. FA monitored competition experiments were performed to determine whether there is a preference. In the first experiment, the CPSF30-FL/α-syn24 RNA complex was formed (with fluorescence labeled α-syn24) after which non-fluorescent labeled polyU RNA was titrated and the effect on fluorescence anisotropy was measured. In the second experiment, the CPSF30-FL/polyU RNA complex was formed and non-fluorescently labeled α-syn24 RNA was titrated and the effect on fluorescence anisotropy was measured. As shown in Figure 3 and S4, significantly less polyU RNA has to be titrated with CPSF30-FL/α-syn24 (20 nM) than AU hexamer (α-syn24) with CPSF30-FL/polyU (7015 nM) to displace the starting RNA sequence. These results reveal that CPSF30-FL preferentially binds to the polyU RNA.
Figure 3.
Competitive titration of unlabeled α-syn24 with 5 nM polyU-F and 198 nM CPSF30-FL (purple); competitive titration of unlabeled polyU with 5 nM α-syn24 -F and 301 nM CPSF30-FL (red). Experiments were performed in 20 mM Tris, 100 mM NaCl, 0.3 mg/ml Poly(C), and 0.1 mg/ml BSA, pH 8 buffer. Inset: Zoom in of titration data between 0-80 nM titrant RNA.
CONCLUSIONS
CPSF30 is a multi-domain ZF protein that contains both CCCH and CCHC domains, along with an Fe-S co-factor. CPSF30 binds to two highly conserved pre-mRNA sequences AAUAAA (the PAS hexamer) and U-rich (poly Uracil). We have identified single nucleotide variants within the PAS hexamer that severely attenuate and, in some cases, abrogate CPSF30-5F/RNA binding. Some of the variants are associated with disease states that involve altered PAS signals. These findings suggest a potential link between CPSF30 function and these diseases. We have also demonstrated that CPSF30 binds to RNA sequences that are AU-rich, with high affinity and selectivity. These results support a growing body of evidence that CCCH type ZFs preferentially bind AU-rich RNA sequences. As such, we predict that newly identified CCCH type ZFs will be found to target AU rich RNA sequences when they are isolated and studied experimentally.
The addition of the C-terminus zinc knuckle domain to CPSF30-5F (forming CPSF30-FL) restores binding to the pre-mRNA sequences for which the PAS hexamer has been mutated. The zinc knuckle domain recognizes a polyU sequence on pre-mRNA, and these results suggests a hierarchy in RNA binding by CPSF30, whereby polyU binding occurs preferentially to PAS binding. We propose that the hierarchy of RNA binding observed in vitro is connected to the accessibility of specific sequence motifs within the pre-mRNA targets of CPSF30. During pre-mRNA processing, U-rich RNA binding proteins are often expressed and regulate pre-mRNA via direct protein/RNA binding interactions. When these U-rich RNA binding proteins are present and bound to pre-mRNA, U-rich sequences will be inaccessible for CPSF30 binding. Consequently, CPSF30 will bind to the AU hexamer of the PAS. This hypothesis has support in the biological literature. For example, when the polyU binding protein HNRNPC is absent (downregulated), PAS sites with U-rich motifs in close proximity are used more often in pre-mRNA processing, suggesting that CPSF30 is involved in binding to these motifs.42 Taken together, we propose a model whereby U-rich binding proteins block binding of CPSF30 to the poly-U sequence by rendering the polyU sequence inaccessible, thereby promoting PAS binding. When the U-rich binding proteins are downregulated, the polyU sequence is made accessible to CPSF30 and direct binding to polyU occurs (Figure 4). In addition, during oogenesis, CPSF30 has been found to be involved in cytoplasmic polyadenylation. For this process to occur efficiently both AAUAAA and a U-rich motif are needed, suggesting an additional role of multi-RNA binding by CPSF30.47-50 Future studies in cells will test these predictions.
Figure 4.
Possible mechanism by which U-rich binding protein HNRNPC down regulation leads to increased PAS site usage with U-rich regions in close proximity to the PAS.
Supplementary Material
ACKNOWLEDGMENTS
SLJM is grateful for the NSF (CHE-1708732) for support of this work and JDP acknowledges a NIH sponsored CBI training grant (GM066706) and the AFPE for support. Additional support was provided by the University of Maryland School of Pharmacy Mass Spectrometry Center (SOP1841-IQB2014). Biorender.com was used for designing the TOC graphic, figure 1, and figure 4.
Footnotes
Supporting Information
The supporting information is available free of charge at https://pubs.acs.org.
Supplemental figures S1-S3 describing a bar graph showing the flanking region FA data, a bar graph describing the ARE FA data, and a plot describing further polyU and α-syn24 competition studies as well as supplemental tables S1-S2 describing a comparison of this paper to Sheets et al’s work on polyadenylation efficiency and data from FA studies with PAS variants and full length CPSF30 investigated in this work (PDF)
Tables providing additional details of the PAS hexamer bioinformatics (XLSX)
Tables providing additional details of the PAS flanking region bioinformatics (XLSX)
The authors declare no competing interests.
Accession Codes
CPSF30: UniProt- O19137
REFERENCES
- [1].Decaria L, Bertini I, and Williams RJ (2010) Zinc proteomes, phylogenetics and evolution, Metallomics 2, 706–709. [DOI] [PubMed] [Google Scholar]
- [2].Kluska K, Adamczyk J, and Krężel A (2018) Metal binding properties, stability and reactivity of zinc fingers, Coord. Chem. Rev 367, 18–64. [Google Scholar]
- [3].Lee SJ, and Michel SLJ (2014) Structural Metal Sites in Nonclassical Zinc Finger Proteins Involved in Transcriptional and Translational Regulation, Acc. Chem. Res 47, 2643–2650. [DOI] [PubMed] [Google Scholar]
- [4].Searles MA, Lu D, and Klug A (2000) The role of the central zinc fingers of transcription factor IIIA in binding to 5 S RNA, J. Mol. Biol 301, 47–60. [DOI] [PubMed] [Google Scholar]
- [5].Klug A (2010) The Discovery of Zinc Fingers and Their Applications in Gene Regulation and Genome Manipulation, Annu. Rev. Biochem 79, 213–231. [DOI] [PubMed] [Google Scholar]
- [6].Krishna SS, Majumdar I, and Grishin NV (2003) Structural classification of zinc fingers, Nucleic Acids Res. 31, 532–550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Matthews JM, and Sunde M (2002) Zinc Fingers--Folds for Many Occasions, IUBMB Life 54, 351–355. [DOI] [PubMed] [Google Scholar]
- [8].Maret W (2012) New perspectives of zinc coordination environments in proteins, J. Inorg. Biochem 111, 110–116. [DOI] [PubMed] [Google Scholar]
- [9].Maret W (2004) Zinc and sulfur: a critical biological partnership, Biochemistry 43, 3301–3309. [DOI] [PubMed] [Google Scholar]
- [10].Fu M, and Blackshear PJ (2017) RNA-binding proteins in immune regulation: a focus on CCCH zinc finger proteins, Nat. Rev. Immunol 17, 130–143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Maeda K, and Akira S (2017) Regulation of mRNA stability by CCCH-type zinc-finger proteins in immune cells, Int. immunol 29, 149–155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Lai WS, Carballo E, Strum JR, Kennington EA, Phillips RS, and Blackshear PJ (1999) Evidence that tristetraprolin binds to AU-rich elements and promotes the deadenylation and destabilization of tumor necrosis factor alpha mRNA, Mol. Cell. Biol 19, 4311–4323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Shimberg GD, Michalek JL, Oluyadi AA, Rodrigues AV, Zucconi BE, Neu HM, Ghosh S, Sureschandra K, Wilson GM, Stemmler TL, and Michel SLJ (2016) Cleavage and polyadenylation specificity factor 30: An RNA-binding zinc-finger protein with an unexpected 2Fe–2S cluster, Proc. Natl. Acad. Sci. U.S.A 113, 4700–4705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Kafasla P, Skliris A, and Kontoyiannis DL (2014) Post-transcriptional coordination of immunological responses by RNA-binding proteins, Nat. Immunol 15, 492–502. [DOI] [PubMed] [Google Scholar]
- [15].diTargiani RC, Lee SJ, Wassink S, and Michel SL (2006) Functional characterization of iron-substituted tristetraprolin-2D (TTP-2D, NUP475-2D): RNA binding affinity and selectivity, Biochemistry 45, 13641–13649. [DOI] [PubMed] [Google Scholar]
- [16].Barabino SM, Hubner W, Jenny A, Minvielle-Sebastia L, and Keller W (1997) The 30-kD subunit of mammalian cleavage and polyadenylation specificity factor and its yeast homolog are RNA-binding zinc finger proteins, Genes Dev. 11, 1703–1716. [DOI] [PubMed] [Google Scholar]
- [17].Yang Q, and Doublié S (2011) Structural biology of poly(A) site definition, Wiley Interdiscip. Rev. RNA 2, 732–747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Chan SL, Huppertz I, Yao C, Weng L, Moresco JJ, Yates JR 3rd, Ule J, Manley JL, and Shi Y (2014) CPSF30 and Wdr33 directly bind to AAUAAA in mammalian mRNA 3' processing, Genes Dev. 28, 2370–2380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Proudfoot NJ (2011) Ending the message: poly(A) signals then and now, Genes Dev. 25, 1770–1782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Chang JW, Yeh HS, and Yong J (2017) Alternative Polyadenylation in Human Diseases, Endocrinol. Metab. (Seoul, Repub. Korea) 32, 413–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Thore S, and Fribourg S (2019) Structural insights into the 3'-end mRNA maturation machinery: Snapshot on polyadenylation signal recognition, Biochimie 164, 105–110. [DOI] [PubMed] [Google Scholar]
- [22].Sheets MD, Ogg SC, and Wickens MP (1990) Point mutations in AAUAAA and the poly (A) addition site: effects on the accuracy and efficiency of cleavage and polyadenylation in vitro, Nucleic Acids Res. 18, 5799–5805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Hamilton K, Sun Y, and Tong L (2019) Biophysical characterizations of the recognition of the AAUAAA polyadenylation signal, RNA 25, 1673–1680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Clerici M, Faini M, Aebersold R, and Jinek M (2017) Structural insights into the assembly and polyA signal recognition mechanism of the human CPSF complex, eLife 6, e33111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Pritts JD, Hursey MS, Michalek JL, Batelu S, Stemmler TL, and Michel SLJ (2020) Unraveling the RNA Binding Properties of the Iron–Sulfur Zinc Finger Protein CPSF30, Biochemistry 59, 970–982. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Lai WS, Wells ML, Perera L, and Blackshear PJ (2019) The tandem zinc finger RNA binding domain of members of the tristetraprolin protein family, Wiley Interdiscip. Rev. : RNA 10, e1531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Shimberg GD, Pritts JD, and Michel SLJ (2018) Iron-Sulfur Clusters in Zinc Finger Proteins, Methods Enzymol. 599, 101–137. [DOI] [PubMed] [Google Scholar]
- [28].Clerici M, Faini M, Muckenfuss LM, Aebersold R, and Jinek M (2018) Structural basis of AAUAAA polyadenylation signal recognition by the human CPSF complex, Nat. Struct. Mol. Biol 25, 135–138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Sun Y, Zhang Y, Hamilton K, Manley JL, Shi Y, Walz T, and Tong L (2018) Molecular basis for the recognition of the human AAUAAA polyadenylation signal, Proc. Natl. Acad. Sci. U.S.A 115, E1419–E1428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Danckwardt S, Hentze MW, and Kulozik AE (2008) 3' end mRNA processing: molecular mechanisms and implications for health and disease, EMBO J. 27, 482–498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Higgs DR, Goodbourn SE, Lamb J, Clegg JB, Weatherall DJ, and Proudfoot NJ (1983) Alpha-thalassaemia caused by a polyadenylation signal mutation, Nature 306, 398–400. [DOI] [PubMed] [Google Scholar]
- [32].Orkin SH, Cheng TC, Antonarakis SE, and Kazazian HH Jr. (1985) Thalassemia due to a mutation in the cleavage-polyadenylation signal of the human beta-globin gene, EMBO J. 4, 453–456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Bennett CL, Brunkow ME, Ramsdell F, O'Briant KC, Zhu Q, Fuleihan RL, Shigeoka AO, Ochs HD, and Chance PF (2001) A rare polyadenylation signal mutation of the FOXP3 gene (AAUAAA->AAUGAA) leads to the IPEX syndrome, Immunogenetics 53, 435–439. [DOI] [PubMed] [Google Scholar]
- [34].Yasuda M, Shabbeer J, Osawa M, and Desnick RJ (2003) Fabry disease: novel alpha-galactosidase A 3'-terminal mutations result in multiple transcripts due to aberrant 3'-end formation, Am. J. Hum. Genet 73, 162–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Turner RE, Pattison AD, and Beilharz TH (2017) Alternative polyadenylation in the regulation and dysregulation of gene expression, Semin. Cell Dev. Biol 75, 61–69. [DOI] [PubMed] [Google Scholar]
- [36].Tian B, and Manley JL (2017) Alternative polyadenylation of mRNA precursors, Nat. Rev. Mol. Cell Biol 18, 18–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Hellquist A, Zucchelli M, Kivinen K, Saarialho-Kere U, Koskenmies S, Widen E, Julkunen H, Wong A, Karjalainen-Lindsberg ML, Skoog T, Vendelin J, Cunninghame-Graham DS, Vyse TJ, Kere J, and Lindgren CM (2007) The human GIMAP5 gene has a common polyadenylation polymorphism increasing risk to systemic lupus erythematosus, J. Med. Genet 44, 314–321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Herrmann CJ, Schmidt R, Kanitz A, Artimo P, Gruber AJ, and Zavolan M (2020) PolyASite 2.0: a consolidated atlas of polyadenylation sites from 3' end sequencing, Nucleic Acids Res. 48, D174–D179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Crooks GE, Hon G, Chandonia JM, and Brenner SE (2004) WebLogo: a sequence logo generator, Genome Res. 14, 1188–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Stephens SM, Chen JY, Davidson MG, Thomas S, and Trute BM (2005) Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences, Nucleic Acids Res. 33, D675–679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Beaudoing E, Freier S, Wyatt JR, Claverie JM, and Gautheret D (2000) Patterns of variant polyadenylation signal usage in human genes, Genome Res. 10, 1001–1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Gruber AJ, Schmidt R, Gruber AR, Martin G, Ghosh S, Belmadani M, Keller W, and Zavolan M (2016) A comprehensive analysis of 3' end sequencing data sets reveals novel polyadenylation signals and the repressive role of heterogeneous ribonucleoprotein C on cleavage and polyadenylation, Genome Res. 26, 1145–1159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Hoque M, Ji Z, Zheng D, Luo W, Li W, You B, Park JY, Yehia G, and Tian B (2013) Analysis of alternative cleavage and polyadenylation by 3' region extraction and deep sequencing, Nat. Methods 10, 133–139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Tian B, and Graber JH (2012) Signals for pre-mRNA cleavage and polyadenylation, Wiley Interdiscip. Rev. : RNA 3, 385–396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Carballo E, Lai WS, and Blackshear PJ (1998) Feedback inhibition of macrophage tumor necrosis factor-alpha production by tristetraprolin, Science 281, 1001–1005. [DOI] [PubMed] [Google Scholar]
- [46].Brewer BY, Malicka J, Blackshear PJ, and Wilson GM (2004) RNA sequence elements required for high affinity binding by the zinc finger domain of tristetraprolin: conformational changes coupled to the bipartite nature of Au-rich MRNA-destabilizing motifs, J. Biol. Chem 279, 27870–27877. [DOI] [PubMed] [Google Scholar]
- [47].Dickson KS, Thompson SR, Gray NK, and Wickens M (2001) Poly(A) polymerase and the regulation of cytoplasmic polyadenylation, J. Biol. Chem 276, 41810–41816. [DOI] [PubMed] [Google Scholar]
- [48].Dickson KS, Bilger A, Ballantyne S, and Wickens MP (1999) The cleavage and polyadenylation specificity factor in Xenopus laevis oocytes is a cytoplasmic factor involved in regulated polyadenylation, Mol. Cell. Biol 19, 5707–5717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Bilger A, Fox CA, Wahle E, and Wickens M (1994) Nuclear polyadenylation factors recognize cytoplasmic polyadenylation elements, Genes Dev. 8, 1106–1116. [DOI] [PubMed] [Google Scholar]
- [50].de Moor CH, and Richter JD (1999) Cytoplasmic polyadenylation elements mediate masking and unmasking of cyclin B1 mRNA, EMBO J. 18, 2294–2303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.