Abstract
Background
Cytoplasmic polyadenylation element binding proteins (Cpebs) are a family of proteins that bind to defined groups of mRNAs and regulate their translation. While Cpebs were originally identified as important features of oocyte maturation, recent interest is due to their prospective roles in neural system plasticity.
Results
In this study we made use of bioinformatic tools and methods including NCBI Blast, UCSC Blat, and Invitrogen Vector NTI to comprehensively analyze all known isoforms of four mouse Cpeb paralogs extracted from the national UniGene, UniProt, and NCBI protein databases. We identified multiple alternative splicing variants for each Cpeb. Regions of commonality and distinctiveness were evident when comparing Cpeb2, 3, and 4. In addition, we performed cross-ortholog comparisons among multiple species. The exon patterns were generally conserved across vertebrates. Mouse and human isoforms were compared in greater detail as they are the most represented in the current databases. The homologous and distinct regions are strictly conserved in mouse Cpeb and human CPEB proteins. Novel variants were proposed based on cross-ortholog comparisons and validated using biological methods. The functions of the alternatively spliced regions were predicted using the Eukaryotic Linear Motif resource.
Conclusions
Together, the large number of transcripts and proteins indicate the presence of a hitherto unappreciated complexity in the regulation and functions of Cpebs. The evolutionary retention of variable regions as described here is most likely an indication of their functional significance.
Keywords: in silico, Cpeb, bioinformatics, isoforms, paralogs, orthologs, alternative splicing
Introduction
Cytoplasmic polyadenylation element binding proteins are a family of mRNA binding proteins that play essential regulatory roles in the translation of defined mRNAs. First discovered during oocyte maturation,1 the role of Cpeb-mediated control of translation has now been expanded to include a wider variety of scenarios including cell cycling2,3 and synaptic plasticity. 4 The identification of Cpebs in a wide variety of tissues5,6 indicates that they may function as a ubiquitous means for controlling the translation of specifically targeted mRNAs.
Four Cpeb paralogs have been identified in mouse. The first family member, Cpeb1, was identified using single-step RNA affinity chromatography. Enriched in oocyte, it is indispensible for cytoplasmic poly(A) elongation during oocyte maturation.1 Transcripts for Cpeb2 were first identified in mouse testis using an EST database and degenerative PCR.1,7 Cpeb3 and Cpeb4 were first detected in mouse brain via PCR and Northern blotting using primers/probes similar to human CPEB-like sequences.5 The N termini of Cpeb1–4 are highly variable, whereas the C-termini, where RNA recognition motifs (RRMs) reside, are more conservative. Sequence analysis has revealed that Cpeb1 is distant from Cpeb2, 3 and 4 in the family tree.5 Expression of Cpeb1, 2, 3 and 4 mRNAs in the hippocampus demonstrated overlapping, yet distinct patterns.8 Cpeb3, in particular, has been associated with human memory.9 The cytoplasmic polyadenylation element (CPE), a short U-rich motif, has been identified in the 3′UTRs of mRNAs targeted by Cpeb1,10,11 while a distinct loop-forming U-rich motif appears to be indispensible for the binding of Cpeb4 and Cpeb3, but not of Cpeb1 protein.8
Previous biological findings suggested that Cpeb paralogs, although distinct in their own ways, may share some commonality in their structure and distribution, and may possibly provide some compensation and redundancy in their function. A systematic analysis of Cpebs based on the current databases and literature would surely be informative and instructive to ongoing Cpeb-related research. The purpose of the current study is to perform a comprehensive survey and analyses on three scales: within each paralog, across-paralog, and across-ortholog. Through data mining of the current nucleotide and protein databases and previous publications, we derived the alternative splicing patterns for each Cpeb. Some of the newly proposed alternatively spliced regions were confirmed experimentally. Cross-paralog and cross-ortholog comparisons illuminated the similarities and the unique attributes of four Cpebs, as well as the extraordinarily high level of conservation of each Cpeb across species. A bioinformatics analysis revealed the presence of specific functional motifs.
Results and Discussion
Cpeb1 protein isoforms with internal deletions of 1 or 5-amino acid (aa), or with an N-terminal truncation of 75-aa
A total of nine cDNA sequences for mouse Cpeb1 were extracted from the UniGene database (supplementary Table 1). Fragmented sequences and redundant sequences were identified with the bioinformatics tools Blast and Vector NTI and removed from further analysis. Four non-redundant full-length cDNAs were aligned to mouse genomic DNA (derived from the UCSC mouse genome) to infer exon-exon boundaries and to derive alternatively spliced exons (Fig. 1A). The comparison demonstrated that the variances in the lengths of the first and last exons lead to different 5′ UTRs or 3′ UTRs, respectively (Fig. 1A). Two variable sequences in the protein coding region (CDS), including a 3-nucleotide (nt) deletion resulting from partial exon 4 skipping and a 15-nt deletion resulting from partial exon 7 skipping, would lead to altered proteins. The presence of transcripts with or without the 15-nt variable region has been confirmed in mouse brain, ovary,12 and retina (Fig. 1B left).
Table S1.
mRNA | Size (bp) | Description | Protein | Size (aa) | Comments |
---|---|---|---|---|---|
AK077799.1 | 1737 | Mus musculus adult male thymus cDNA | BAC37017.1 | 140 (422–561) | Fragmented |
AK207851.1 | 402 | Mus musculus cDNA | – | Disconnected translation | |
AK199617.1 | 379 | Mus musculus cDNA | – | Disconnected translation | |
AK136088.1 | 2697 | Mus musculus in vitro fertilized eggs cDNA product:cytoplasmic polyadenylation element binding protein 1, full insert sequence | Translated* | 561 | Possibly misreading of 1 nucleotide; otherwise gives identical product as NP_031781 |
AK135615.1 | 814 | Mus musculus in vitro fertilized eggs cDNA, product:cytoplasmic polyadenylation element binding protein 1, full insert sequence | – | Disconnected translation | |
BC125476.1 | 1777 | Mus musculus cytoplasmic polyadenylation element binding protein 1, mRNA, complete cds | AAI25477.1 | 562 | 1-aa insertion compared to NP_031781 |
BC144948.1 | 1759 | Mus musculus cDNA | Translated* | 556 | 5-aa deletion (PKGNM) compared to NP_031781 |
Y08260.1 | 2610 | M.musculus mRNA for CPEB protein | CAA69588.1 | 561 | |
NM_007755.4 | 2612 | Mus musculus cytoplasmic polyadenylation element binding protein 1 (cpeb1), mRNA | NP_031781.1 | 561 |
Two Cpeb1 protein sequences were extracted from the UniProt protein database and aligned using Vector NTI software (Fig. 1C). Meanwhile, we computationally translated all non-redundant full-length Cpeb1 transcripts with the aid of Vector NTI, and then compared the translated protein products to the protein sequences in the database. Our computational translation of cDNA BC144948.1 yielded a protein with a 5-aa deletion, which is not documented in the protein database (Fig. 1C). The removal of the 5-aa motif is due to partial skipping of exon 7 (15-nt) as previously described (Fig. 1A). In addition to the evidence at the transcript level in the mouse (Fig. 1B left), two isoforms of human CPEB1 proteins with the same 5-aa deletion13 were identified in the UniProt database (Fig. 1D). Stringent homology is evident between mouse and human as we later conclude. In particular, the locations and sequences of the 5-aa deletion in mouse Cpeb1 and human CPEB1 are identical (Fig. 1C, 1D, Fig. 6D). Additional evidence includes the presence of the 90-nt and 105-nt variants of a particular exon, which corresponds to exon 7 in mouse (supplementary Table 2, the asterisk), across vertebrates. The stringent conservation of exon patterns across vertebrates strongly supports the presence of the 15-nt (5-aa) alternative splices within this exon which likely will have important functional implications.
Table S2.
C. elegans | 415 | 360 | 111 | 91 | 340 | 216 | 757 | NM_066650.4 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Drosophila | 419 | 500 | 220 | 482 | 137 | 232 | 719 | 181 | 386 | 1584 | NM_079736.2 | |||
chicken | 49 | 178 | 189 | 227 | 250 | 114 | 90 | 137 | 199 | 95 | 81 | 111 | XM_413713.2 | |
chimpanzee | 62 | 1844 | 175 | 189 | 227 | 253 | 114 | 90 | 137 | 199 | 95 | 81 | 1504 | XM_001158685.1 |
cow | 189 | 189 | 227 | 253 | 114 | 90 | 137 | 199 | 95 | 81 | 1456 | XM_864691.3 | ||
frog | 91 | 178 | 201 | 218 | 253 | 114 | 105 | 137 | 199 | 95 | 81 | 1479 | NM_001017330.2 | |
horse | 183 | 192 | 227 | 253 | 114 | 90 | 137 | 199 | 95 | 81 | 1463 | XM_001498253.2 | ||
human | 122 | 175 | 189 | 227 | 253 | 114 | 90 | 137 | 199 | 95 | 81 | 1502 | NM_030594.3 | |
marmoset | 120 | 175 | 189 | 224 | 253 | 114 | 90 | 137 | 199 | 95 | 81 | 1469 | XM_002749169.1 | |
mouse | 43 | 175 | 189 | 224 | 253 | 114 | 105 | 137 | 199 | 95 | 81 | 997 | NM_007755.4 | |
orangutan | 256 | 175 | 189 | 227 | 253 | 114 | 90 | 137 | 199 | 95 | 81 | 1500 | NM_001132960.1 | |
pig | 175 | 189 | 227 | 253 | 114 | 105 | 137 | 199 | 95 | 81 | 111 | NM_001097510.1 | ||
rat | 44 | 175 | 189 | 224 | 253 | 114 | 105 | 137 | 199 | 95 | 81 | 999 | NM_001106276.1 | |
zebrafish | 30 | 169 | 192 | 209 | 253 | 114 | 105 | 137 | 199 | 95 | 81 | 1303 | NM_131427.1 | |
* |
The 15-nt (5-aa) is located within the first RRM (Fig. 1C). Further analysis identified that the 5-aa is adjacent to the octamer consensus of the first RRM (Fig. 5, green box). The insertion or deletion of the 5-aa may have a pivotal impact on the specificity of Cpeb1, because the sequences surrounding the consensus of RRMs are important for the specificity of RNA binding.14,15 To what extent this 5-aa impacts RNA binding is yet to be established.
Two alternative splicing isoforms containing a 75-aa N-terminal truncation were evident in human CPEB1 (Fig. 1D). With the knowledge that there is a near-identical conservation between human CPEB1 and mouse Cpeb1 (Fig. 6B–D), the question arises: Does the 75-aa truncation in human have a counterpart in mouse? Based on our theoretical translation with Vector NTI, we postulated that the 75-aa truncation in mouse could be derived from the removal of exon 2, which leads to a frame shift and an alternative translational initiation site (Fig. 1A). Therefore, we designed primers spanning this alternative region in mouse. A primer pair at exon 1/3 junction and within exon 3 confirmed the presence of a Cpeb1 transcript “exon 2 deletion” in mouse retina (Fig. 1B right). This PCR-based evidence at the level of transcripts provided support for the presence of this novel isoform of mouse Cpeb1 with 75-aa N-terminal truncation.
Cpeb2 protein isoforms with internal deletions of 30-aa or 8-aa
We extracted eight cDNA sequences for Cpeb2 from the UniGene database (supplementary Table 3). Three non-redundant, full-length sequences were used for further analysis (Fig. 2A). A recently updated sequence for one isoform (NM_175937.3) was also included in the diagram. The alignment demonstrated that the variances in the lengths of the first and last exons of Cpeb2 lead to different 5′ UTRs and 3′ UTRs, respectively. The partial skipping of exon 2 removes 3-nt from the 5′ UTR. The alternative splices of exon 4 and exon 7 remove 90-nt and 24-nt respectively from the coding region. Our RT-PCR results confirmed the expression of transcripts with or without the 90-nt variable region in adult mouse retina (Fig. 2B). Transcripts without the 24-nt was not detected, perhaps due to competition for the same primers which can lead to masking of the less abundant isoforms by the more dominant ones, as observed in Cpeb3.6 Alternatively, this could be due to a distinct tissue specificity and/or condition. The alternative use of these two regions was also observed in a number of other vertebrate species (supplementary Table 4, the asterisks).
Table S3.
mRNA | Size (bp) | Description | Protein | Size (aa) | Comments |
---|---|---|---|---|---|
AK076221.1 | 4536 | Mus musculus 15 days embryo head cDNA, | Translated* | 189 | Fragmented |
AK042065.1 | 2629 | Mus musculus 3 days neonate thymus cDNA, | Translated* | 30-aa deletion, 8-aa insertion compared to NP_787951 | |
AB100307.1 | 1942 | Mus musculus cpeb2 mRNA, complete cds | BAC57076.1 | 521 | |
NM_175937.2 | 1942 | Mus musculus cytoplasmic polyadenylation element binding protein 2 (cpeb2), mRNA | NP_787951.1 | 521 | |
AK164866.1 | 904 | Mus musculus 15 days embryo head cDNA, | 89 | Fragmented | |
AK154330.1 | 3576 | Mus musculus NOD-derived CD11c +ve dendritic cells cDNA, | – | Disconnected translation | |
BC107349.1 | 1801 | Mus musculus cytoplasmic polyadenylation element binding protein 2, mRNA | AAI07350.1 | 521 | 3-nt deletion in 5′ UTR |
BC107350.1 | 1801 | Mus musculus cytoplasmic polyadenylation element binding protein 2, mRNA | AAI07351.1 | 521 | 3-nt deletion in 5′ UTR |
Table S4.
C. elegans | 47 | 337 | 174 | 236 | 776 | 196 | NM_062835.2 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cow | 295 | 91 | 51 | 24 | 171 | 90 | 119 | 115 | 182 | 3994 | XM_001787297.1 | |||
horse | 289 | 91 | 51 | 24 | 171 | 90 | 119 | 115 | 182 | 3992 | XM_001498900.2 | |||
human (new) | 1662 | 282 | 91 | 51 | 171 | 90 | 119 | 115 | 182 | 4001 | NM_182646.2 | |||
human | 438 | 282 | 91 | 51 | 171 | 90 | 119 | 115 | 182 | 3998 | NM_182646.1 | |||
mouse (new) | 1626# | 282 | 90 | 91 | 51 | 171 | 90 | 119 | 115 | 182 | 4006 | NM_175937.3 | ||
mouse | 59 | 200 | 282 | 90 | 91 | 51 | 171 | 90 | 119 | 115 | 182 | 472 | NM_175937.2 | |
rat | 203 | 265 | 90 | 91 | 51 | 171 | 90 | 140 | 115 | 182 | 4003 | NM_001108361.1 | ||
zebrafish | 224 | 91 | 51 | 174 | 90 | 119 | 115 | 182 | 307 | NM_001177457.1 | ||||
* | * |
One mouse Cpeb2 protein sequence was documented in the UniProt database (Q812E0). An additional isoform which was recently updated in the NCBI protein database (but not in the UniProt database) was added to the top (Fig. 2C, NP_787951.2). Both sequences contain an 8-aa deletion resulting from the removal of exon 7 (Fig. 2A). When we translated non-redundant full-length Cpeb2 transcripts using Vector NTI, we identified a novel protein from the translation of cDNA AK0421065.1 (Fig. 2C). This predicted isoform has a 30-aa deletion resulting from the removal of exon 4 (90-nt), which has been confirmed at the level of transcripts (Fig. 2B). In addition, we identified the same 30-aa deletion in human CPEB2 (Fig. 2D). Six human CPEB2 protein isoforms, including the one recently deposited in the NCBI protein database (NP_001170853.1), are distinguished by the presence or absence of three motifs of 22-aa, 30-aa, or 8-aa each (Fig. 2D). The sequences and the relative locations of the 30-aa and 8-aa in human CPEB2 are comparable to those in mouse. The strong homology between mouse and human (Fig. 6B–D) and the similar findings in human provided additional support for the presence of a mouse Cpeb2 isoform with a 30-aa deletion.
Both mouse and human Cpeb2 sequences have been updated in the NCBI database during the preparation of this manuscript. Of particular interest, the updated sequences demonstrate the presence of an extra-long isoform of Cpeb2—almost double the previously published size (Fig. 2C, 2D, top isoforms). Our sequence alignment demonstrated that both the previous and the updated mouse isoforms are legitimate— the use of an extended exon1 leads to the much longer N terminus in the newly uncovered isoform (Fig. 2A, insert). This may be of relevance to prior investigations of CPEB3, in which antibodies recognized the predicted protein at ~78 kD in western blots, but also detected an additional protein band above 100 kD6,8 (see below).
Cpeb3 protein isoforms with internal deletions of 23-aa or 8-aa, an N-terminal truncation of 216-aa, or a C-terminal truncation of 132-aa with an altered C terminus
Eight full-length cDNA sequences of mouse Cpeb3 were extracted from the UniGene database (supplementary Table 5). In addition, partial sequences of two transcripts were experimentally identified (Fig. 3A, sequences with dashed lines).5,6 Sequence alignments indicated that the alternative usage of exons 1–3 and variable length of exon 13 lead to different 5′ UTRs or 3′ UTRs, respectively (Fig. 3A). Five alternative splices within the CDS region involve intra-exon skipping of exon 4 (388-nt), partial skipping of exon 5 (69-nt), deletion of exon 7 (24-nt), deletion of exon 11 (115-nt), and extension of exon 11. The intra-exon skipping of exon 4 results in the use of an alternative translation start codon. The extension of exon 11 leads to altered downstream sequence and an early termination. The majority of these alternatively spliced regions have been experimentally identified in many tissues, including in multiple regions of the central nervous system.6 The partial skipping of exon 5 and the deletion of exon 7 have been observed in some other vertebrates (supplementary Table 6, the asterisks).
Table S5.
mRNA | Size (bp) | Description | Protein | Size (aa) | Comments |
---|---|---|---|---|---|
AK044639.1 | 4223 | Mus musculus adult retina cDNA | Translated* | 693 | *189th a top codon; 23-aa deletion |
AK029261.1 | 2411 | Mus musculus 0 day neonate head cDNA | Translated* | 469 | 216-aa truncation; a 23-aa deletion; a 8-aa deletion |
AB093274.1 | 5310 | Mus musculus mRNA for mKIAA0940 protein | BAC41458.1# | 716 | #Reported as 722 aa, but we presume it is 716 aa identical to NP_938042, because the 1st Methionine is the 7th aa. |
AY313774.1 | 3148 | Mus musculus cytoplasmic polyadenylation element binding protein 3 (cpeb3) mRNA, complete cds | AAQ20843.1 | 716 | Compare to NP_938042: One amino acid conversion: 372nd N -> P |
NM_198300.2 | 5792 | Mus musculus cytoplasmic polyadenylation element binding protein 3 (cpeb3), mRNA | NP_938042.2 | 716 | |
AK147243.1 | 5660 | Mus musculus cDNA, cytoplasmic polyadenylation element binding protein 3, full insert sequence | BAE27791.1 | 716 | |
AK161513.1 | 2097 | Mus musculus adult male testis cDNA, cytoplasmic polyadenylation element binding protein 3, full insert sequence | BAE36436.1 | 561 | 23-aa deletion; Early termination with conversion: VELA ->GEWK |
BC128377.1 | 2142 | Mus musculus cytoplasmic polyadenylation element binding protein 3, mRNA, complete cds | AAI28378.1 | 477 | 216-aa truncation; a 23-aa deletion; |
BC128378.1 | 465 | Mus musculus cDNA | - | Discontinued translation |
Table S6.
C. elegans | 152 | 930 | 159 | 601 | 213 | 747 | NM_059279.5 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
chimpanzee | 193## | 1019 | 160 | 57 | 141 | 90 | 119 | 115 | 182 | 1792 | XM_001145135.1 | |
cow | 1012 | 160 | 57 | 141 | 90 | 119 | 115 | 182 | 3784 | XM_875092.3 | ||
frog | 1098 | 91 | 57 | 24 | 168 | 90 | 119 | 115 | 182 | 331 | NM_001015925.2 | |
horse | 161 | 1010 | 160 | 57 | 141 | 90 | 119 | 115 | XM_001917417.1 | |||
human | 193## | 1016 | 160 | 57 | 141 | 90 | 119 | 115 | 182 | 3800 | NM_014912.4 | |
mouse | 61 | 1019 | 160 | 57 | 24 | 168 | 90 | 119 | 115 | 182 | 3797 | NM_198300.2 |
rat | 1209 | 160 | 57 | 24 | 168 | 90 | 119 | 115 | 182 | 345 | XM_220043.5 | |
zebrafish | 814 | 85 | 57 | 24 | 168 | 90 | 119 | 115 | 182 | 252 | NM_001167662.1 | |
* | * | # |
Six protein sequences of mouse Cpeb3 were extracted from the UniProt and the NCBI protein databases. Sequence comparisons demonstrated four variable regions (Fig. 3C). The alternative usage of a 23-aa region and an 8-aa region is attributable to the alternative splicing in exon 5 and exon 7, respectively. A 216-aa N-terminal truncation may result from the use of an alternative translation initiation codon when intra-exon skipping occurs to exon 4. A 132-aa C-terminal truncation which removes the majority of the second RRM2, and terminates with four distinct amino acids (Fig. 3C, Q7TN99-5) can be derived from the extension of exon 11 (Fig. 3A).
The sequences and locations of the 23-aa and the 8-aa regions are conserved in human CPEB3 and mouse Cpeb3 (Fig. 3C, 3D). One human isoform has a deletion of 17-aa, which includes the 8-aa and the adjacent 9-aa C-terminal to it (Fig. 3D, Q8NE35-1). To further explore the validity of the 17-aa deletion, we compared additional organisms. A particular exon (supplementary Table 6, column denoted by the pound sign) shows two variants (141-nt or 168-nt) among the species investigated. We postulated that the presence of the 141-nt was due to a 27-nt skipping within the 168-nt exon. Sequence alignment indicated that this 168-nt is exon 8 in mouse Cpeb3 (Fig. 3A), and the 27-nt skipping would occur in the beginning of exon 8 (Fig. 3A, the part of exon 8 highlighted in gray), and would lead to a deletion in the protein product of 9-aa next to the aforementioned 8-aa. The removal of the 8-aa disrupts a Pkb recognition site, whereas the deletion of the 17-aa abolishes the Pkb phosphorylation site as well as a Pka phosphorylation site (Table 2). The 27-nt deletion has been detected in mouse testis and to a lesser degree in the hippocampus, cortex, and olfactory bulb (Fig. 3B).
Table 2.
Elm name | Instances | Positions | Elm description | Pattern |
---|---|---|---|---|
CLV_NDR_NDR_1 | RRG | 58–60 | N-Arg dibasic convertase (nardilysine) cleavage site (Xaa-|-Arg-Lys or Arg-|-Arg-Xaa) | .RK|RR[^KR] |
CLV_PCSK_PC7_1 | RSYGRRR | 53–59 | Proprotein convertase 7 (PC7, PCSK7) cleavage site (Arg-Xaa-Xaa-Xaa-[Arg/Lys]-Arg-|-Xaa) | [R]...[KR]R. |
LIG_14-3-3_2 | RGRSSLF | 59–65 | Longer mode 2 interacting phospho-motif for 14-3-3 proteins with key conservation RxxxS#p. | R..[^P][ST][IVLM]. |
LIG_BRCT_BRCA1_1 | RSSLF | 61–65 | Phosphopeptide motif which directly interacts with the BRCT (carboxy-terminal) domain of the Breast Cancer Gene BRCA1 with low affinity | .S..F |
LIG_Clathr_ClatBox_1 | LFPFE | 64–68 | Clathrin box motif found on cargo adaptor proteins, it interacts with the beta propeller structure located at the N-terminus of Clathrin heavy chain. | L[IVLMF].[IVLMF][DE] |
LIG_EVH1_II | PPMSF | 18–22 | Proline-rich motif binding to signal transduction class II EVH1 domains | PP..F |
LIG_FHA_1 | PGTDNIM | 42–48 | Phosphothreonine motif binding a subset of FHA domains that show a preference for a large aliphatic amino acid at the pT + 3 position. | ..(T)..[ILV]. |
LIG_FHA_2 | IRTDHEP | 1–7 | Phosphothreonine motif binding a subset of FHA domains that have a preference for an acidic amino acid at the pT + 3 position. | ..(T)..[DE]. |
LIG_PDZ_3 | HEPL | 5–8 | Class III PDZ domains binding motif | .[DE].[IVL] |
TDNI | 44–47 | |||
LIG_SH3_3 | HYPPSGP | 12–18 | This is the motif recognized by those SH3 domains with a non-canonical class I recognition specificity | ...[PV]..P |
MOD_Cter_Amidation | YGRR | 55–58 | Peptide C-terminal amidation | (.)G[RK][RK] |
MOD_GlcNHglycan | PSGP | 15–18 | Glycosaminoglycan attachment site | [ED](0,3).(S)[GA]. |
MOD_PKA_2 | GRSSLFP | 60–66 | Pka phosphorylation site | .R.([ST])... |
MOD_PKB_1 | RRRGRSSLF | 57–65 | Pkb Phosphorylation site | R.R..([ST])... |
TRG_PEX | WRNHF | 27–31 | Specific ELM present in Pex5p and binding to Pex13p and Pex14p. Part of the peroxisomal matrix protein import system | W...[FY] |
LIG_MAPK_1 | KGRMGINF | 9–16 | MAPK interacting molecules (e.g. MAPKKs, substrates, phosphatases) carry docking motif that help to regulate specific interaction in the MAPK cascade. The classic motif approximates (R/K)xxxx#x# where # is a hydrophobic residue. | [KR](0,2)[KR].(0,2) [KR].(2,4)[ILVM]. [ILVF] |
(Number refer to Sequence* and **). |
Note: The numeric positions in column 3 refer to positions in the following sequence investigated: IRTDHEPLKGKHYPPSGPPMSFADIMWRNHFAGRMGINFHHPGTDNIMALNTR SYGRRRGRSSLFPFED with the exception that the number in the last row refers to positions in the following two sequences: *IRTDHEPLKGGRMGINFHHPG TDNIMALNTRSYGRRRG RSLFPFED and **IRTDHEPLKGGRMGINFHHPGTDNIMALNGRSLFPFED. The letters in bold represent the 23-aa and/or the 8-aa regions.
One human CPEB3 isoform (BAG54433.1) has a 317-aa N-terminal truncation (Fig. 3D). The strong homology between mouse and human prompted us to question the validities of the 317-aa truncation in the human and the 216-aa truncation in the mouse (Fig. 3C). The 216-aa truncation in mouse Cpeb3 is derived from an intra-exon skipping of exon 4 (Fig. 3A). The corresponding cDNA (AK127060.1) for human CPEB3 isoform with 317-aa truncation has a very short 5′ UTR. Computational translation indicated that the truncation of 317-aa could be due to an incomplete 5′ sequence of the cDNA. Based on the stringent conservation between mouse and human, particularly in the alternatively spliced regions (Fig. 6C–D), it is possible that this human CPEB3 protein isoform has an N terminal truncation of 216- aa instead of 317-aa. Techniques such as 5′ rapid amplification of cDNA ends (5′ RACE) may be used in future studies to obtain the complete 5′ sequence of this transcript as a means to confirm the length of N-terminal truncation in human CPEB3.
More drastic changes appear in the Cpeb3 isoform with a 132-aa C-terminal truncation and an altered tail (VELA→GEWK) (Fig. 3C, and yellow box in Fig. 5). This isoform lacks the majority of the second RRM, including its octamer consensus. This is likely to have a significant impact on RNA-protein interaction and specificity. The DNA or RNA binding proteins thus far identified have one to four RRMs.16 NMR characterization of the structures of several proteins revealed different binding mechanisms for even- and odd-numbered RRMs. For example, heterogeneous nuclear ribonucleoprotein A1 (Hnrnpa1) and polypyrimidine tract binding protein (Ptbb), which contain two and four RRMs, respectively, form homodimers when binding to two molecules of mRNA in anti-parallel arrangement.17,18 In contrast, poly (A) binding protein 2 (Pabp2) which has a single RRM, forms homodimers in the absence of RNA, but becomes monomeric upon mRNA binding.19 Thus the Cpeb3 isoform with a 132-aa C-terminal truncation and an altered tail would likely have its binding characteristics altered.
The largest predicted size for mouse Cpeb3 is approximately 78 kD, but a protein greater than 100 kD has also been detected with antibodies in western blots.6,8 This larger protein has been proposed to be a pre-protein.6 Since Cpeb2 and Cpeb3 are closely-related in the Cpeb family (Fig. 6A), the recent finding of the extra-long Cpeb2 (Fig. 2A, C, D, the top isoforms) makes it plausible to postulate the presence of a similar extra-long isoform for Cpeb3. Both human and chimpanzee have a beginning exon of 193-nt (supplementary Table 6, the double pound signs) which, if mapped to mouse genome, would be adjacent to the 61-nt exon. In addition, one isoform of mouse Cpeb3 indeed has an extended “5′ UTR” (Fig. 3A, AK044639.1). A putative translation indicates that an upstream extension of the 61-nt exon into and beyond the 193-nt would lead to a continuous extension of Cpeb3 protein beyond the N-terminus. We analyzed a genomic sequence of about 2000-bp upstream of the first exon, and realized that for the extended translation to be long enough (that is, to match the difference between ~100 kD and 78 kD), additional upstream splice(s) may be necessary (Fig. 3, insert). Additional experimental evidence is required to determine the validity and the exact length of an extra-long Cpeb3.
Cpeb4 protein isoforms with internal deletions of 17-aa or 8-aa, or an N-terminal truncation of 382-aa
Sixteen cDNA sequences representing mouse Cpeb4 were extracted from the UniGene database (supplementary Table 7). After removing the fragmented and redundant sequences, we used three remaining cDNA sequences for sequence alignment. Two additional isoforms based on a previous report were also used for analysis5 (Fig. 4A). The comparison of these five sequences demonstrated that variations in exon 1 could lead to different 5′ UTRs or alternative translation initiation sites. Variations in the length of the last exon could lead to different 3′ UTRs. Alternative splicing of exon 3 and exon 4 would likely result in the removal of 51-nt and 24-nt, respectively (Fig. 4A). All four isoforms related to altered exons 3 and 4 have been identified in mouse brain tissue.5 Two isoforms in the same region are also evident in adult mouse retina (Fig. 4B). The alternative use of these two exons in Cpeb4 is highly conserved among vertebrates (supplementary Table 8, the asterisks).
Table S7.
mRNA | Size (bp) | Description | Protein | Size (aa) | Comments |
---|---|---|---|---|---|
AK089951.1 | 1774 | Mus musculus kidney CCL-142 RAG cDNA | Translated* | 128 (1–128) | Fragmented |
AK088039.1 | 1904 | Mus musculus 2 days neonate thymus thymic cells cDNA | – | Disconnected translation | |
AK079421.1 | 3217 | Mus musculus adult male bone cDNA, | – | Disconnected translation | |
AY313775.1 | 2313 | Mus musculus cytoplasmic polyadenylation element binding protein 4 (cpeb4) mRNA, complete cds | AAQ20844.1 | 729 | |
AK173229.1 | 7585 | Mus musculus mRNA for mKIAA1673 protein | BAD32507.1 | 704 | 25-aa deletion |
BC079599.1 | 5437 | Mus musculus cytoplasmic polyadenylation element binding protein 4, mRNA | Translated* | 339 (383–721) | 8-aa deletion |
AK162101.1 | 2164 | Mus musculus in vitro fertilized eggs cDNA, | BAE36725.1 | 262 (1–262) | Fragmented |
AK154289.1 | 2438 | Mus musculus NOD-derived CD11c +ve dendritic cells cDNA | BAE32491.1 | 338 (1–338) | Fragmented |
BC115431.1 | 2279 | Mus musculus cytoplasmic polyadenylation element binding protein 4, mRNA | AAI15432.1 | 704 | 25-aa deletion |
BC115430.1 | 2279 | Mus musculus cytoplasmic polyadenylation element binding protein 4, mRNA | AAI15431.1 | 704 | 25-aa deletion |
BC145865.1 | 2300 | Mus musculus cytoplasmic polyadenylation element binding protein 4, mRNA | AAI45866.1 | 729 | |
BC145863.1 | 2300 | Mus musculus cytoplasmic polyadenylation element binding protein 4, mRNA | AAI45864.1 | 729 | |
AK021394.1 | 1066 | Mus musculus 0 day neonate eyeball cDNA | Translated* | 141 (589–729) | Fragmented |
AK015401.1 | 1586 | Mus musculus adult male testis cDNA | BAB29832.1 | 295 (435–729) | |
AK015381.1 | 1077 | Mus musculus adult male testis cDNA | BAB29821.1 | 295 (435–729) | |
NM_026252.3 | 2312 | Mus musculus cytoplasmic polyadenylation element binding protein 4 (cpeb4), mRNA | NP_080528.2 | 729 |
Table S8.
chimpanzee | 39 | 24 | 174 | 90 | 119 | 115 | 182 | 5091 | XM_001155021.1 | ||
cow | 203 | 82 | 174 | 90 | 119 | 115 | 182 | 4401 | NM_001105420.1 | ||
dog | 1682 | 82 | 51 | 174 | 90 | 119 | 115 | 182 | 4739 | XM_536428.2 | |
human | 2531 | 82 | 51 | 24 | 174 | 90 | 119 | 115 | 182 | 4401 | NM_030627.2 |
horse | 1662 | 82 | 51 | 24 | 174 | 90 | 119 | 115 | 182 | 4421 | XM_001502804.2 |
marmmoset | 2541 | 82 | 51 | 24 | 174 | 90 | 119 | 115 | 182 | 4535 | XM_002744562.1 |
mouse | 1202 | 82 | 51 | 24 | 174 | 90 | 119 | 115 | 182 | 273 | NM_026252.3 |
rabbit | 1125 | 82 | 51 | 24 | 174 | 90 | 119 | 115 | 182 | 228 | XM_002710388.1 |
rat | 1125 | 82 | 51 | 174 | 90 | 119 | 115 | 182 | 892 | NM_001106992.1 | |
Rhesus | 99 | 82 | 174 | 90 | 119 | 115 | 182 | 4231 | XM_001097641.1 | ||
zebrafish | 25 | 1172 | 93 | 384? | 115 | 182 | 375 | NM_200981.1 | |||
* | * |
Four isoforms of mouse Cpeb4 proteins were extracted from the UniProt database. The alignment of the protein isoforms reflects the deletions of 17-aa and 8-aa motifs (Fig. 4C), which correspond to the removal of exon 3 and exon 4, respectively (Fig. 4A). Computational translation of cDNA BC079599.1 yielded a novel protein with a 382-aa N-terminal truncation (Fig. 4C). The deletion of 382-aa, like the deletions of the 17-aa, and 8-aa, was identified in human CPEB4 at the same locations (Fig. 4D). These sequences of the alternatively spliced regions were also strictly conserved in mouse and human (Fig. 6D).
Across-paralog comparison of mouse Cpeb1, 2, 3 and 4
The overall sequence of Cpeb1 has been reported to have low homology to Cpeb2–4.5 We demonstrate here that the alternatively spliced regions of Cpeb1 are rather different from those of Cpeb2–4 (Fig. 5, 6C–D). This fact strengthens the previously reported notion that Cpeb1 is a distant cousin of Cpeb2–4. Cpeb2, 3, and 4 have almost identical RRMs but variable N termini. Of interest, an 8-aa motif within the variable region is stringently conserved among Cpeb2–4. This motif is located N-terminal to the first RRM (Fig. 5, the red box). Its deletion leads to the removal of certain functional motifs for protein cleavage, protein-protein interaction, and phosphorylation (Table 1–3). Although the deletion of the 8-aa disrupts a Pkb recognition site, the newly identified 9-aa deletion adjacent to the 8-aa in Cpeb3 (Fig. 5, orange box) would lead to the removal of this Pkb phosphorylation site and a Pka phosphorylation site altogether (Table 2). Based on the sequence homology among Cpeb2–4, it is plausible to predict that the 9-aa may be alternatively spliced in Cpeb2 and Cpeb4 as well. The deletion of the 9-aa would likely cause the removal of a Pka phosphorylation site in Cpeb2 as well (Table 1). However, this Pka phosphorylation site is absent from Cpeb4 due to a single amino acid substitution (GRSSLLP -> GQSSLLP, Fig. 5).
Table 1.
Elm name | Instances | Positions | Elm description | Pattern |
---|---|---|---|---|
CLV_NDR_NDR_1 | RRG | 94–96 | N-Arg dibasic convertase (nardilysine) cleavage site (Xaa-|-Arg-Lys or Arg-|-Arg-Xaa) | .RK|RR[^KR] |
CLV_PCSK_PC7_1 | RSYGRRR | 89–95 | Proprotein convertase 7 (PC7, PCSK7) cleavage site (Arg-Xaa-Xaa-Xaa-[Arg/Lys]-Arg-|-Xaa) | [R]...[KR]R. |
LIG_14-3-3_2 | RMYDSLN | 44–50 | Longer mode 2 interacting phospho-motif for 14-3-3 proteins with key conservation RxxxS#p | R..[^P][ST][IVLM]. |
RGRSSLF | 95–101 | Longer mode 2 interacting phospho-motif for 14-3-3 proteins with key conservation RxxxS#p. | ||
LIG_APCC_Dbox_1 | GRSSLFPID | 96–104 | An RxxL-based motif that binds to the Cdh1 and Cdc20 components of APC/C thereby targeting the protein for destruction in a cell cycle dependent manner | .R..L..[LIVM]. |
LIG_BRCT_BRCA1_1 | RSSLF | 97–101 | Phosphopeptide motif which directly interacts with the BRCT (carboxy-terminal) domain of the Breast Cancer Gene BRCA1 with low affinity | .S..F |
LIG_Clathr_ClatBox_1 | LFPID | 100–104 | Clathrin box motif found on cargo adaptor proteins, it interacts with the beta propeller structure located at the N-terminus of Clathrin heavy chain. | L[IVLMF].[IVLMF][DE] |
LIG_FHA_1 | AGTSRID | 33–39 | Phosphothreonine motif binding a subset of FHA domains that show a preference for a large aliphatic amino acid at the pT + 3 position. | ..(T)..[ILV]. |
PGTDNLL | 78–84 | |||
LIG_PDZ_3 | SDSL | 22–25 | Class III PDZ domains binding motif | .[DE].[IVL] |
YDSL | 46–49 | Class III PDZ domains binding motif | ||
HDPL | 66–69 | |||
TDNL | 80–83 | |||
LIG_USP7_1 | AWGSD | 19–23 | The USP7 NTD domain binding motif variant based on the MDM2 and P53 interactions. | [PA][^P][^FYWIL]S[^P] |
AGTSR | 33–37 | |||
MOD_CK1_1 | SWCTAAG | 28–34 | CK1 phosphorylation site | S..([ST])... |
MOD_Cter_Amidation | YGRR | 91–94 | Peptide C-terminal amidation | (.)G[RK][RK] |
MOD_GSK3_1 | GSDSLQDS | 21–28 | GSK3 phosphorylation recognition site | ...([ST])...[ST] |
SWCTAAGT | 28–35 | |||
NMHSLENS | 50–57 | |||
MOD_PKA_2 | VRSSLQL | 11–17 | Pka phosphorylation site | .R.([ST])... |
GRLSYPH | 71–77 | |||
GRSSLFP | 96–102 | Pka phosphorylation site | ||
MOD_PKB_1 | RRRGRSSLF | 93–101 | Pkb Phosphorylation site | R.R..([ST])... |
MOD_PLK | LENSLID | 54–60 | Site phosphorylated by the Polo-like-kinase | .[DE].[ST][ILFWMVA].. |
TRG_ENDOCYTIC_2 | YDSL | 46–49 | Tyrosine-based sorting signal responsible for the interaction with mu subunit of AP (Adaptor Protein) complex | Y..[LMVIF] |
Note: The numeric positions in column 3 are based on the following sequence investigated: NNSNTLLPLQVRSSLQLPAWGSDSLQDSWCTAAGTSRIDQDRSRMYDSLNMHSLENSLIDIM RAEHDPLKGRLSYPHPGTDNLLMLNARSYGRRRGRSLFPIDD. The letters in bold represent the 30-aa and the 8-aa regions.
Table 3.
Elm name | Instances | Positions | Elm description | Pattern |
---|---|---|---|---|
CLV_NDR_NDR_1 | RRG | 34–36 | N-Arg dibasic convertase (nardilysine) cleavage site (Xaa-|-Arg-Lys or Arg-|-Arg-Xaa) | .RK|RR[^KR] |
CLV_PCSK_PC7_1 | RTYGRRR | 29–35 | Proprotein convertase 7 (PC7, PCSK7) cleavage site (Arg-Xaa-Xaa-Xaa-[Arg/Lys]-Arg-|-Xaa) | [R]...[KR]R. |
LIG_14-3-3_2 | RGQSSLF | 35–41 | Longer mode 2 interacting phospho-motif for 14-3-3 proteins with key conservation RxxxS#p. | R..[^P][ST] [IVLM]. |
LIG_BRCT_BRCA1_1 | QSSLF | 37–41 | Phosphopeptide motif which directly interacts with the BRCT (carboxy-terminal) domain of the Breast Cancer Gene BRCA1 with low affinity | .S..F |
LIG_Clathr_ClatBox_1 | LFPME | 40–44 | Clathrin box motif found on cargo adaptor proteins, it interacts with the beta propeller structure located at the N-terminus of Clathrin heavy chain. | L[IVLMF]. [IVLMF][DE] |
LIG_PDZ_3 | NDSI | 6–9 | Class III PDZ domains binding motif | .[DE].[IVL] |
MOD_CK1_1 | SDSSLLI | 20–26 | CK1 phosphorylation site | S..([ST])... |
MOD_Cter_Amidation | YGRR | 31–34 | Peptide C-terminal amidation | (.)G[RK][RK] |
MOD_GSK3_1 | LNYSYPGS | 13–20 | GSK3 phosphorylation recognition site | ...([ST])...[ST] |
MOD_N-GLC_1 | ENDSIK | 5–10 | Generic motif for N-glycosylation. Shakin-Eshleman et al. showed that Trp, Asp, and Glu are uncommon before the Ser/Thr position. Efficient glycosylation usually occurs when ~60 residues or more separate the glycosylation acceptor site from the C-terminus | .(N)[^P][ST].. |
LNYSYP | 13–18 | |||
MOD_PKB_1 | RRRGQSSLF | 33–41 | Pkb Phosphorylation site | R.R..([ST])... |
MOD_PLK | SDSSLLI | 20–26 | Site phosphorylated by the Polo-like-kinase | .[DE].[ST] [ILFWMVA].. |
TRG_LysEnd_ APsAcLL_1 |
DSSLLI | 21–26 | Sorting and internalisation signal found in the cytoplasmic juxta-membrane region of type I transmembrane proteins. Targets them from the Trans Golgi Network to the lysosomal-endosomal-melanosomal compartments. Interacts with adaptor protein (AP) complexes | [DER]...L[LVI] |
Note: The numeric positions in column 3 are based on the following sequence investigated: IMRAENDSIKGRLNYSYPGSDSSLLINARTYGRRRG QSSLFPMED. The letters in bold represent the 25-aa (the 17-aa plus the 8-aa) region.
Another common feature of Cpeb2–4 is the alternatively splicing of the 17~30-aa N-terminal to the 8-aa motif. In contrast to the 8-aa and 9-aa sequences which are highly conserved among Cpeb2–4, the 17~30-aa sequences show little evidence of homology among the three paralogs (Fig. 5, the blue boxes). Functional predictions demonstrated that the deletion of this region removes motifs implicated in protein-protein interactions, phosphorylation, and post-translational modifications (Table 1–3). Noteworthy among these findings is the deletion of the 23-aa motif in Cpeb3: this not only removes certain functional motifs, but also creates a novel site for Mapk interaction (Table 2).
This 17~30-aa variable regions become shorter and closer to the 8-aa motif in the order of Cpeb2→Cpeb3→Cpeb4, until the gap closes in Cpeb4 (Fig. 5). Functional predictions revealed that the linker regions may harbor class III PDZ (postsynaptic density-95, discs-large, zonula occludens-1) domain binding motifs. The numbers of such motifs vary: there are 3 in CPEB2 (Table 1), 1 in Cpeb3 (Table 3), and none in Cpeb4 (Table 5). The sequences of class III PDZ domain binding motifs are also different. Such motifs may recruit different Cpeb paralogs to different protein partners, thereby leading to distinctive localizations and functions.
Together, the aforementioned variations enclosing the 8-aa, 9-aa, and 17~30-aa region would likely determine protein-protein interaction, phosphorylation, and post-translational modifications of Cpebs. The functional significance of this region is of great interest for future studies.
Three paralogs, Cpeb1, 3 and 4, have isoforms with large N-terminal truncations. Such truncations may have a major impact on the function of the proteins. For instance, the Cpeb4 isoform with a large N terminal truncation may be deprived of many, if not all, of the phosphorylation sites (Fig. 4C). This would likely make it a putative candidate for a dominant negative form. Our analysis using the bioinformatics tool Eukaryotic Linear Motif resource (ELM, http://elm.eu.org) indicated that the N-terminal fragments may also harbor featured sites such as those required for post-translational modification and protein-protein interaction (data not shown). The presence or absence of such sites may alter signaling pathways, stimulusdependence, or the development of particular protein complexes.
The C termini of RNA binding proteins determine the affinity and specificity of RNA binding. Two highly conserved short consensus motifs, a hexamer and an octamer, are separated by about 30-aa and embedded in a structurally conserved, but not sequence conserved RRM region of approximately 90-aa16,20 (Fig. 5). These two short consensuses are deemed hallmarks of RNA binding proteins. The linker sequences between two RRMs are also highly conserved among RNA binding proteins. The hexamer and octamer consensuses and the linker regions are essential for protein-RNA interaction. However, the specificity of RNA binding is determined by sequences surrounding the hexamer and octamer, as well as sequences N terminal to the first RRM and C terminal to the second RRM.14,15 Based on the near-identical homology of these important functional regions in Cpeb2, 3, and 4 (Fig. 5), we predict that Cpeb2, 3, and 4 recognize similar targets. However, Cpeb1, whose sequence deviates significantly from that of Cpeb2–4 even with regard to the short consensus, must recognize a different set of targets and employ a distinct mechanism for RNA interaction. Indeed certain RNA oligonucleotides interact with Cpeb3 and Cpeb4, but not Cpeb1 protein, and the reverse is also true.8
Across-ortholog comparisons provide new insights
Comparisons across species can be instructive. In this study we found that the exon structures of Cpeb orthologs among vertebrates are almost identical. Most of the internal exons are of exactly the same lengths across a wide range of organisms from zebrafish to human (supplementary tables 2, 4, 6, 8). With regard to the proteins, the phylogenetic tree clearly demonstrated that they are better conserved across species than across paralogs (Fig. 6A). Of all currently documented Cpeb proteins, the orthologs are closer to each other than the paralogs are. It is also evident that Cpeb protein paralogs are highly conserved between mouse and human. Despite an additional 4-aa C-terminal “tag” in human CPEB1 (Fig. 1D), the rest of the sequences between mouse and human orthologs are nearly identical (Fig. 6B). The patterns, locations, and sequences of the alternatively spliced regions are strictly preserved between mouse and human (Fig. 6C, 6D). For instance, the 5-aa deletion in mouse Cpeb1, although not present in mouse Cpeb2–4, is found to be identical in human CPEB1. This variation is also evident at the level of the transcripts in multiple species (supplementary Table 2). Similar findings are true in the alternative spliced regions in Cpeb2–4, as demonstrated for multiple vertebrates at the level of the transcripts (supplementary tables 4, 6, 8), with little or no variation between mouse and human at the level of the proteins (Fig. 6D). These findings provide strong foundations for cross-species predictions. For example, novel isoforms in one species may be predicted based on the evidence in another, as demonstrated by the discovery of the “exon 2 deletion” transcript of mouse Cpeb1 (Fig. 1A, 1C, 1D), and the partial skipping of exon 8 in mouse Cpeb3 (supplementary Table 6, Figure 3A, 3D). Information in one species may also be borrowed to cross-examine the accuracy in another, as in the question regarding the length of N-terminal truncation of human CPEB3. Evidently, such comparative analysis may be used to discover unknown isoforms or to establish the correct sequences for known isoforms.
One may also exploit such logic to predict functional motifs in one species according to the other. A good example is the prediction of phosphorylation sites (PhophoSitePlus). A few amino acids have been confirmed as phosphorylation sites in mouse or human (Figs. 1–4C, D, the solid triangles) by various techniques including nuclear magnetic resonance (NMR) and mass spectrometry (MS).21,25 Based on the stringent homology between mouse and human, the same amino acids at identical or similar locations were identified and predicted to be phosphorylation sites in the other species (Figs. 1–4C, D, the open triangles).31
Multiple levels of variability indicate extraordinary complexity in the regulation and function of Cpebs
The presence of more than one isoform in each Cpeb reveals the complexity in their regulatory capabilities and functions. Each alternative splicing may confer an additional layer of divergence in the regulation of biological and cellular functions. Variances in the UTRs, in particular, may attest to regulations at the transcriptional level, that is, alternative transcription initiation or termination. Variations in the UTRs then impose additional controls over translation, specifically, the initiation, termination, and efficiency of translation. Another layer of regulation, alternative splicing, leads to variances in the protein sequences themselves. The differences in protein sequences dictate the uniqueness of their functions. Alterations of as small as a few amino acids (for example, the 5-aa insertion in Cpeb1, the 8-aa and the 9-aa deletions in Cpeb3) may alter the when, where and how the Cpebs perform their functions and connect with their targets.
Conclusions
In conclusion, our study delineated alternative splicing isoforms for mouse Cpeb1–4. New isoforms were predicted based on theoretical translation, crossortholog comparison, and experimental validation. Functions of the alternatively spliced regions were predicted using bioinformatics approaches. The variety of transcript structures and protein structures indicate an extraordinary complexity in the regulation and functions of the Cpebs.
Methods
Animal handling and tissue collection
All animal experimental procedures were performed in compliance with animal care regulations set by the University of Louisville Institutional Animal Care and Use Committee (IACUC) as well as the Association for Research in Vision and Ophthalmology (ARVO) statement for the use of animals in vision research.
C57/BL6 mice (Charles River Laboratories, Davis, CA) were used in this study to confirm the presence and/or absence of some isoforms predicted by in silico methods. The animals were euthanized with CO2 followed by cervical dislocation. Retinas were dissected immediately and frozen on dry ice before proceeding to RNA extraction.
RNA extracton and RT-PCR
Frozen retinas were homogenized using a PowerGen 250 homogenizer (Fisher Scientific, Pittsburgh, PA). Total RNA was extracted using RNeasy mini kits (Qiagen, Valencia, CA) following the manufacturer’s instructions. The concentration of RNA was determined using a BioPhotometer (Eppendorf, Westbury, NY), and the quality of RNA was determined by the ratio of 28S/18S on an agarose gel. RNA was frozen in −80 °C for long term storage.
0.2 μg of total RNA was used in a 20 μl RT reaction using AMV reverse transcriptase (Promega, Madison WI). 1 μl of the cDNA was used for subsequent PCR. The gene-specific primers spanning the regions of interest for cpeb1, cpeb2, cpeb3, and cpeb4 were designed using Vector NTI (Analysis module, Oligo Analysis, Invitrogen, Carlsbad, CA) and obtained from IDT (Coralville, IA). Locations of these primers were demonstrated in Figure 1–4B. Sequences of these primers were listed in supplementary Table 9. PCR was carried out on a thermocycler using the following conditions: 95 °C 15 min for the initial activation; 40 cycles of: 94 °C 30 sec (denaturation), 50–55 °C 30 sec (annealing temperature varies based on the primers), 72 °C 30 sec (extension); and followed by 72 °C 10 min (final extension). The resulting PCR products were separated on 1% agarose gels and photographed. Individual bands were excised, purified and sequenced to confirm their identities.
Table S9.
Gene | Primer name | Location | Sequence |
---|---|---|---|
Cpeb1 | f 1_3 | exon1/3 | 5′-GGCTTTCTCTCTGACTTCCAGGACTC-3′ |
r 3 | exon 3 | 5′-GACTGTGTGCTGCTCTGGGCTG-3′ | |
f 7 | exon 7 | 5′-TGGTAAGGATGGCAAGCACCCC-3′ | |
r 8 | exon 8 | 5′ACGGACTTCTCTAGTTCAAACACCAAA-3′ | |
Cpeb2 | fwd | exon 3 | 5′-TCAGGACAGACAACAATAGTAACACA-3′ |
rev | exon 8 | 5′-ATCTATTGGAAATAGGGAAGAGCGA-3′ | |
Cpeb3 | ex4 f | exon 4 | 5′-TGGATGGAGGATAACGCTTT-3′ |
ex5 f | exon 5 | 5′-CTGACCATGAGCCTCTGAAA-3 | |
ex8 d_r1 | exon 8(−27nt)/6 | 5′-CATCCAAGAAGGCGTTGTTA-3′ | |
ex8 d_r2 | exon 8(−27nt)/7 | 5′-TCCAAGAAGGCGTCTCGTC-3′ | |
Cpeb4 | f 2_3 | exon 2/3 | 5′-AATGATTCCATTAAAGGTCGTCTA-3′ |
f 2_4 | exon 2/4 | 5′-AAATGATTCCATTAAAGCAAGG-3′ | |
f 2_5 | exon 2/5 | 5′-AATGATTCCATTAAAGGTCAGTCT-3′ | |
r 5_3 | exon 5/3 | 5′-GGAAACAATGAAGACTGACCATTA-3′ | |
r 5_4 | exon 5/4 | 5′-GAAACAATGAAGACTGACCTCTCC-3′ | |
r 5 | exon 5 | 5′-GAGGCAATCCACCCACAA-3′ |
Gene nomenclature
All gene symbols in the manuscript abide by the guidelines recommended by Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC), and are in accordance with the human HGNC database and the mouse genome databases (MGD). For example, Cpeb represents mouse DNA or mRNA, Cpeb represents mouse protein, CPEB represents human DNA or mRNA, and CPEB represents human protein.
Data mining, sequence alignment, and theoretical translation
Both mouse curated RefSeq sequences (NM_) and uncurated cDNAs were extracted from the UniGene database (www.ncbi.nlm.nih.gov/unigene) to collect as much information as possible. For all the other species, only RefSeq sequences were extracted for simplicity. The genomic sequences were derived from UCSC genome database (www.genome.ucsc. edu). UCSC Blat (www.genome.ucsc.edu/cgi-bin/hgBlat; Genome: mouse; at default settings) was used to align cDNA sequences to the genome, to deduce the length of exons, and to define the boundaries of exons. The location of each alternative splice was determined by comparing the genomic locations of exon-exon boundaries. NCBI Blast was used to align cDNAs to one another and to remove redundant sequences from further analysis. Whenever possible, partial sequences encompassing alternative regions of the cDNA entries were confirmed in our laboratory using RT-PCR and subsequent sequencing. Mouse and human protein sequences were extracted from the UniProt database ( www.uniprot.org/). The NCBI protein database (www.ncbi.nlm.nih.gov/protein/) was also explored for additional information.
Mouse protein sequences were compared to mouse cDNA sequences with the aid of computational translation using Vector NTI software ( Analyses module, Translation). Six frames (3 direct, 3 complementary) were used for each translation. The frame that gave the longest continuous read was selected, and the product designated as the protein product. If the translation started with the first codon which is not a methionine, or terminated at the last codon which is not a stop codon, then the cDNA is considered “fragmented” and removed from further analysis. Vector NTI (Align module, AlignX—Align selected molecules) and ClustalW2 (www.ebi.ac.uk/clustalw2/, at default settings) were used to align the alternatively spliced protein isoforms as well as human and mouse protein orthologs.
A phylogenetic tree was generated for Cpeb1–4 from multiple vertebrate species with the aid of Geneious Pro ver 4.8 (www.geneious.com) at the following settings: Tree Alignment Options—Cost matrix: Blosum62. Gap open penalty: 12. Gap extension penalty: 3. Alignment type: Global alignment with free end gaps. Tree Builder Options—Genetic distance model: Jukes-Cantor. Tree build method: Neighbor-joining. Outgroup: No Outgroup. The accession numbers of protein sequences used to generate the phylogenetic tree were listed in supplementary Table 10.
Table S10.
C1 | C2 | C3 | C4 | ||||
---|---|---|---|---|---|---|---|
chicken | XP_413713.2 | cow | XP_001787349.2 | chimpanzee | XP_001145135.1 | chimpanzee | XP_001155432.1 |
chimpanzee | XP_001158685.1 | horse | XP_001498950.2 | cow | XP_880185.2 | human | NP_085130.2 |
cow | XP_869784.3 | human | NP_872291.1 | frog | NP_001015925.1 | marmoset | XP_002744608.1 |
frog | NP_001017330.1 | mouse | NP_787951.1 | horse | XP_001917452.1 | mouse | NP_080528.2 |
horse | XP_001498303.2 | rat | NP_001101831.1 | human | NP_055727.3 | rat | NP_001100462.1 |
human | NP_085097.3 | zebrafish | NP_001170928.1 | mouse | NP_938042.2 | zebrafish | NP_957275.1 |
marmoset | XP_002749216.1 | rat | XP_220043.5 | ||||
mouse | NP_031781.1 | zebrafish | NP_001161134.1 | ||||
orangutan | NP_001126432.1 | ||||||
pig | NP_001090979.1 | ||||||
rat | NP_001099746.1 | ||||||
zebrafish | NP_571502.1 |
Functional prediction of alternatively used motifs
No 3D structural information is readily available except for the RRM structure for human CPEB3 (NCBI Cn3D database). Whereas the deletions may lead to critical changes in the secondary/tertiary structures of the proteins, we could only predict the possible functional motifs based on consideration of the linear sequences at this stage. Potential functional motifs were identified with the aid of Eukaryotic Linear Motif resource (ELM, http://elm.eu.org). Since the lengths of the functional motifs used by the ELM algorithm are within 10-aa, to keep the integrity of potential motifs, we included 10-aa N-terminal and 10-aa C-terminal to the regions encompassing the short deletions. Additional predicted and experimentally confirmed phosphorylation sites were from PhosphoSitePlus (www.phosphosite.org).
Supplementary Materials
Acknowledgments
We thank Dr. Ben Harrison and Dr. Eric Rouchka for helpful discussions. This study was supported by NEI R01EY017594, NCRR P20 RR16481 and NIEHS P30ES014443.
Abbreviations
- Cpeb
mouse cytoplasmic polyadenylation element binding protein, cDNA, or used as a general term for the cDNA across species
- Cpeb
mouse cytoplasmic polyadenylation element binding protein, protein, or used as a general term for the protein across species
- CPEB
human cytoplasmic polyadenylation element binding protein, cDNA
- CPEB
human cytoplasmic polyadenylation element binding protein, protein
- CPE
cytoplasmic polyadenylation element
- RRM
RNA recognition motif
- CDS
protein coding sequence
- UTR
untranslated region
- 5’ RACE
5’ rapid amplification of cDNA ends
- NMR
nuclear magnetic resonance
- MS
mass spectrometry
- Mapk
mitogen-activated protein kinase
- Pka
protein kinase a
- Pkb
protein kinase b
- Camk2a
calcium/calmodulin- dependent protein kinase 2 alpha
- Hnrnp
heterogeneous nuclear ribonucleoprotein
- Ptbp
polypyrimidine tract binding protein
- Pabp2
poly (A) binding protein 2
- ELM
eukaryotic linear motif
Footnotes
Disclosures
This manuscript has been read and approved by all authors. This paper is unique and is not under consideration by any other publication and has not been published elsewhere. The authors and peer reviewers of this paper report no conflicts of interest. The authors confirm that they have permission to reproduce any copyrighted material.
References
- 1.Hake LE, Richter JD. CPEB is a specificity factor that mediates cytoplasmic polyadenylation during Xenopus oocyte maturation. Cell. 1994;79:617–27. doi: 10.1016/0092-8674(94)90547-9. [DOI] [PubMed] [Google Scholar]
- 2.Eliscovich C, Peset I, Vernos I, Mendez R. Spindle-localized CPE-mediated translation controls meiotic chromosome segregation. Nat Cell Biol. 2008;10:858–65. doi: 10.1038/ncb1746. [DOI] [PubMed] [Google Scholar]
- 3.Groisman I, Huang YS, Mendez R, Cao Q, Theurkauf W, Richter JD. CPEB, maskin, and cyclin B1 mRNA at the mitotic apparatus: implications for local translational control of cell division. Cell. 2000;103:435–47. doi: 10.1016/s0092-8674(00)00135-5. [DOI] [PubMed] [Google Scholar]
- 4.Wu L, Wells D, Tay J, et al. CPEB-mediated cytoplasmic polyadenylation and the regulation of experience-dependent translation of alpha-CaMKII mRNA at synapses. Neuron. 1998;21:1129–39. doi: 10.1016/s0896-6273(00)80630-3. [DOI] [PubMed] [Google Scholar]
- 5.Theis M, Si K, Kandel ER. Two previously undescribed members of the mouse CPEB family of genes and their inducible expression in the principal cell layers of the hippocampus. Proc Natl Acad Sci U S A. 2003;100:9602–7. doi: 10.1073/pnas.1133424100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wang XP, Cooper NG. Characterization of the transcripts and protein isoforms for cytoplasmic polyadenylation element binding protein-3 (CPEB3) in the mouse retina. BMC Mol Biol. 2009;10:109. doi: 10.1186/1471-2199-10-109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kurihara Y, Tokuriki M, Myojin R, et al. CPEB2, a novel putative translational regulator in mouse haploid germ cells. Biol Reprod. 2003;69:261–8. doi: 10.1095/biolreprod.103.015677. [DOI] [PubMed] [Google Scholar]
- 8.Huang YS, Kan MC, Lin CL, Richter JD. CPEB3 and CPEB4 in neurons: analysis of RNA-binding specificity and translational control of AMPA receptor GluR2 mRNA. EMBO J. 2006;25:4865–76. doi: 10.1038/sj.emboj.7601322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Vogler C, Spalek K, Aerni A, et al. CPEB3 is associated with human episodic memory. Front Behav Neurosci. 2009;3:4. doi: 10.3389/neuro.08.004.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.McGrew LL, Dworkin-Rastl E, Dworkin MB, Richter JD. Poly(A) elongation during Xenopus oocyte maturation is required for translational recruitment and is mediated by a short sequence element. Genes Dev. 1989;3:803–15. doi: 10.1101/gad.3.6.803. [DOI] [PubMed] [Google Scholar]
- 11.Paris J, Richter JD. Maturation-specific polyadenylation and translational control: diversity of cytoplasmic polyadenylation elements, influence of poly(A) tail size, and formation of stable polyadenylation complexes. Mol Cell Biol. 1990;10:5634–45. doi: 10.1128/mcb.10.11.5634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wilczynska A, Aigueperse C, Kress M, Dautry F, Weil D. The translational regulator CPEB1 provides a link between dcp1 bodies and stress granules. J Cell Sci. 2005;118(Pt 5):981–92. doi: 10.1242/jcs.01692. [DOI] [PubMed] [Google Scholar]
- 13.Welk JF, Charlesworth A, Smith GD, MacNicol AM. Identification and characterization of the gene encoding human cytoplasmic polyadenylation element binding protein. Gene. 2001;263:1–2. 113–20. doi: 10.1016/s0378-1119(00)00588-6. [DOI] [PubMed] [Google Scholar]
- 14.Ding J, Hayashi MK, Zhang Y, Manche L, Krainer AR, Xu RM. Crystal structure of the two-RRM domain of hnRNP A1 (UP1) complexed with single-stranded telomeric DNA. Genes Dev. 1999;13:1102–15. doi: 10.1101/gad.13.9.1102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kenan DJ, Query CC, Keene JD. RNA recognition: towards identifying determinants of specificity. Trends Biochem Sci. 1991;16:214–20. doi: 10.1016/0968-0004(91)90088-d. [DOI] [PubMed] [Google Scholar]
- 16.Burd CG, Dreyfuss G. Conserved structures and diversity of functions of RNA-binding proteins. Science. 1994;265:615–21. doi: 10.1126/science.8036511. [DOI] [PubMed] [Google Scholar]
- 17.Crichlow GV, Zhou H, Hsiao HH, et al. Dimerization of FIR upon FUSE DNA binding suggests a mechanism of c-myc inhibition. EMBO J. 2008;27:277–89. doi: 10.1038/sj.emboj.7601936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Perez I, McAfee JG, Patton JG. Multiple RRMs contribute to RNA binding specificity and affinity for polypyrimidine tract binding protein. Biochemistry. 1997;36:11881–90. doi: 10.1021/bi9711745. [DOI] [PubMed] [Google Scholar]
- 19.Song J, McGivern JV, Nichols KW, Markley JL, Sheets MD. Structural basis for RNA recognition by a type II poly(A)-binding protein. Proc Natl Acad Sci U S A. 2008;105:15317–22. doi: 10.1073/pnas.0801274105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lorkovic ZJ, Barta A. Genome analysis: RNA recognition motif (RRM) and K homology (KH) domain RNA-binding proteins from the flowering plant Arabidopsis thaliana. Nucleic Acids Res. 2002;30:623–35. doi: 10.1093/nar/30.3.623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Dephoure N, Zhou C, Villen J, et al. A quantitative atlas of mitotic phosphorylation. Proc Natl Acad Sci U S A. 2008;105:10762–7. doi: 10.1073/pnas.0805139105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Villen J, Beausoleil SA, Gerber SA, Gygi SP. Large-scale phosphorylation analysis of mouse liver. Proc Natl Acad Sci U S A. 2007;104:1488–93. doi: 10.1073/pnas.0609836104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Zanivan S, Gnad F, Wickstrom SA, et al. Solid tumor proteome and phosphoproteome analysis by high resolution mass spectrometry. J Proteome Res. 2008;7:5314–26. doi: 10.1021/pr800599n. [DOI] [PubMed] [Google Scholar]
- 24.Munton RP, Tweedie-Cullen R, Livingstone-Zatchej M, et al. Qualitative and quantitative analyses of protein phosphorylation in naive and stimulated mouse synaptosomal preparations. Mol Cell Proteomics. 2007;6:283–93. doi: 10.1074/mcp.M600046-MCP200. [DOI] [PubMed] [Google Scholar]
- 25.Trinidad JC, Thalhammer A, Specht CG, et al. Quantitative analysis of synaptic phosphorylation and protein expression. Mol Cell Proteomics. 2008;7:684–96. doi: 10.1074/mcp.M700170-MCP200. [DOI] [PubMed] [Google Scholar]
- 26.The UniGene Database. [ http://www.ncbi.nlm.nih.gov/unigene]
- 27.The UniProt Database. [ http://www.uniprot.org]
- 28.The UCSC BLAT. [ www.genome.ucsc.edu/cgi-bin/hgBlat]
- 29.The NCBI Database. [ http://www.ncbi.nlm.nih.gov/sites/entrez]
- 30.The Eukaryotic Linear Motif resource. [ http://elm.eu.org]
- 31.The PhosphoSitePlus. [ www.phosphosite.org]
- 32.ClustalW2. [ www.ebi.ac.uk/clustalw2]
- 33.Drummond AJ, Ashton B, et al. Geneious v4.8. 2009. Available from http:// www.geneious.com/
- 34.Current nomenclature guidelines by the HUGO Gene Nomenclature Committee at the European Bioinformatics Institute. http://www.genenames.org.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.