Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2004 Mar 1;101(10):3504–3509. doi: 10.1073/pnas.0400182101

A variable dinucleotide repeat in the CFTR gene contributes to phenotype diversity by forming RNA secondary structures that alter splicing

Timothy W Hefferon 1, Joshua D Groman 1, Catherine E Yurk 1, Garry R Cutting 1,*
PMCID: PMC373492  PMID: 14993601

Abstract

Dinucleotide repeats are ubiquitous features of eukaryotic genomes that are not generally considered to have functional roles in gene expression. However, the highly variable nature of dinucleotide repeats makes them particularly interesting candidates for modifiers of RNA splicing when they are found near splicing signals. An example of a variable dinucleotide repeat that affects splicing is a TG repeat located in the splice acceptor of exon 9 of the cystic fibrosis transmembrane conductance regulator (CFTR) gene. Higher repeat numbers result in reduced exon 9 splicing efficiency and, in some instances, the reduction in full-length transcript is sufficient to cause male infertility due to congenital bilateral absence of the vas deferens or nonclassic cystic fibrosis. Using a CFTR minigene system, we studied TG tract variation and observed the same correlation between dinucleotide repeat number and exon 9 splicing efficiency seen in vivo. Replacement of the TG dinucleotide tract in the minigene with random sequence abolished splicing of exon 9. Replacements of the TG tract with sequences that can self-base-pair suggested that the formation of an RNA secondary structure was associated with efficient splicing. However, splicing efficiency was inversely correlated with the predicted thermodynamic stability of such structures, demonstrating that intermediate stability was optimal. Finally, substitution with TA repeats of differing length confirmed that stability of the RNA secondary structure, not sequence content, correlated with splicing efficiency. Taken together, these data indicate that dinucleotide repeats can form secondary structures that have variable effects on RNA splicing efficiency and clinical phenotype.


RNA splicing is the process by which eukaryotic cells create mature, functional mRNAs from precursor RNAs. This process requires the precise identification, excision, and ligation of many relatively brief sequences called exons that are retained in the mature transcript. At least four cis sequence elements ensure that exons are recognized and processed accurately by the spliceosome. These include the highly conserved 5′ and 3′ splice sites, and the less conserved branch point consensus and polypyrimidine tract. There is also a growing appreciation for other splicing signals. Enhancers and silencers are oligomeric sequences (typically 6–8 bp) found in or around exons. Binding of accessory splicing factors, such as SR proteins, to these sequences stabilizes or disrupts spliceosome assembly and processivity. Dinucleotide repeats constitute another family of sequences that inf luence splicing (14). Because of their abundance, dinucleotide repeats frequently occur within genes, so this family of sequences has the potential to effect the splicing of many genes. Furthermore, the polymorphic nature of these sequence elements can lead to variable effects on RNA splicing. For example, variation in the number of CA dinucleotides in a repeat in intron 13 of the endothelial nitric oxide synthase (eNOS) gene is associated with variation in RNA splicing and mRNA stability and with risk of coronary artery disease (2, 5). Differences in the binding affinity of heterogeneous nuclear ribonucleoprotein L to the CA repeat in eNOS RNA seem to underlie the variation in splicing efficiency (2).

A well studied dinucleotide repeat that affects splicing efficiency is located in intron 8 of the cystic fibrosis transmembrane conductance regulator (CFTR) gene. A repeat of 9–13 TG dinucleotides lies immediately 5′ of a polymorphic polythymidine tract in the 3′ splice site of CFTR exon 9 (6). Inefficient splicing of exon 9 due to an abbreviated variant of the polythymidine tract called 5T is associated with disease (710, ). Several lines of evidence indicate that variation in the number of TG repeats influences the efficiency of exon 9 splicing in CFTR genes bearing 5T. First, TG repeat number correlates with the level of exon 9 splicing in CFTR transcripts from cultured nasal epithelial cells (11). Second, TG repeat number correlates with disease status among individuals carrying a severe CF mutation on one allele and the 5T variant in trans (11, 12). TG tracts of 12 or 13 dinucleotides are associated with congenital absence of the vas deferens or nonclassic cystic fibrosis pathology, whereas TG repeats of 11 dinucleotides are more common in unaffected individuals.

Given the clinical relevance of TG tract variation in the CFTR gene, we sought to understand the mechanism underlying the association between dinucleotide tract variation and exon 9 splicing efficiency. Using a previously described CFTR minigene splicing system (13), we studied this mechanism by replacing the TG tract with a variety of sequences. Results demonstrate that both the length and the content of the dinucleotide tract contribute to its effect on splicing. Interestingly, tracts composed of TG dinucleotides were associated with the highest levels of splicing efficiency, consistent with a positive role for this motif in CFTR splicing. Furthermore, tracts composed of sequences that were predicted to form hairpin secondary structures conferred relatively high splicing efficiencies. Finally, we discovered a striking inverse correlation between predicted thermodynamic stability of the hairpins and splicing efficiency, suggesting that secondary structures that are transient are associated with optimal levels of splicing.

Materials and Methods

Construction and Validation of the Minigene. The construction and validation of the minigene used in this study has been described elsewhere (13). Briefly, portions of the CFTR gene were amplified by PCR, ligated, and inserted in-frame into an ornithine aminotransferase (OAT) cDNA construct in a mammalian expression vector with a Rous sarcoma virus promoter.

Site-Directed Mutagenesis. Site-directed mutagenesis was performed by using the Transformer kit (Clontech) according to the manufacturer's instructions. Selection oligos in multiply mutated constructs alternated between a unique BsaI site and a unique BsmBI site: Bsa/Bsm+, 5′-CCACCGAGACGCCATTGGGGC-3′ and Bsm/Bsa+, 5′-TACCCCACCGAGACCCCATTGGGGCCAATACGC-3′, respectively. Mutagenic primers can be found in Table 1. Random tracts were generated by using the Visual Basic program gene-erator (K. J. Rohleder, Johns Hopkins University, Baltimore; source code available on request). Random tracts were screened for known splicing signals [AG, polypyrimidine tracts, and branch point consensus YNCURAY (Y is pyrimidine, N is any nucleotide, R is purine, and the branch point is underlined)]. Mutated minigenes were verified by sequencing, grown to bulk, and isolated with a Maxiprep kit (Qiagen, Valencia, CA).

Table 1. Mutagenic oligonucleotides used to generate constructs.

Tract Sequence
(TG)0 actcatcttttatttttga(de/TG)tttttaacagggatttgggg
(TG)8 gacaaactcatcttttatttttgaTGTGTGTGTGTGTGTGtttttaacagggatttgggga
(TG)9 ctcatcttttatttttgaTGTGTGTGTGTTGTGTGTGtttttaacagggatttggg
(TG)10 ctcatcttttatttttgaTGTGTGTGTGTGTGTGTGTGtttttaacagggatttggg
(TG)11 ctcatcttttatttttgaTGTGTGTGTGTGTGTGTGTGTGtttttaacagggatttggg
(TG)12* ctcatcttttatttttgaTGTGTGTGTGTGTGTGTGTGTGTGtttttaacagggatttggg
(TG)13 ctcatcttttatttttgaTGTGTGTGTGTGTGTGTGTGTGTGTGtttttaacagggatttggg
(TG)24 ctcatcttttatttttgaTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGtttttaacagg gatttggg
N24A ctcatcttttatttttgaTTGGTCCACAAGGTTGTATTATAAtttttaacagggatttgggg
N24B ctcatcttttatttttgaTGCTATGTTCGGGCATTACCTGACtttttaacagggatttgggg
N24C ctcatcttttatttttgaTATGGCCTACGAACACGCTACTAAtttttaacagggatttgggg
N16A ctcatcttttatttttgaTGCTCCACATCCGCAAtttttaacagggatttggg
N16B ctcatcttttatttttgaTTGCCAACGAACTCGCtttttaacagggatttggg
N16C ctcatcttttatttttgaTAAACTGGATTATACAtttttaacagggatttggg
(TA)6 (rev) ccccaaatccctgttaaaaaTATATATATATAcaaaaataaaagatgagtttgtc
(TA)7 (rev) ccccaaatccctgttaaaaaTATATATATATATAtcaaaaataaaagatgagtttgtc
(TA)8 (rev) ccccaaatccctgttaaaaaTATATATATATATATAtcaaaaataaaagatgagtttgtc
(TA)11 (rev) ccccaaatccctgttaaaaaTATATATATATATATATATATAtcaaaaataaaagatgagtttgtc
(TA)12 (rev) ccccaaatccctgttaaaaaTATATATATATATATATATATATAtcaaaaataaaagatgagtttgtc
(CG)12 gacaaactcatcttttatttttgaCGCGCGCGCGCGCGCGCGCGCGCGtttttaacagggatttggggaa
(CA)12 gacaaactcatcttttatttttgaCACACACACACACACACACACACAtttttaacagggatttggggaa
A24 ctcatcttttatttttgatAAAAAAAAAAAAAAAAAAAAAAAtttttaacagggatttgggg
Pin 1 ctcatcttttatttttgaTGCGACGGACTTCGGTCCGTCGCGtttttaacag
Pin 2 actcatcttttatttttgaTATAGCGATCTTCGGATCGCTATGtttttaacagg
(TG)5TTCG(TG)5 ctcatcttttatttttgaTGTGTGTGTGTTCGTGTGTGTGTGtttttaacagggatttggg
*

Original minigene sequence, shown for comparison. Substitutions are shown in capital letters, and unchanged sequences are shown in lowercase letters. Boldface letters indicate the TTCG loop motif. rev, reverse

Cell Culture, Transfections, and Isolation of RNA. HEK293 cells (American Type Culture Collection) were grown in Eagle's minimal essential medium supplemented with 10% FBS (Biofluids, Rockville, MD) and 1% penicillin/streptomycin (Life Technologies, Rockville, MD) in 5% CO2 at 37°C in T75 flasks (Falcon). Cells were trypsinized (Biofluids), transferred to six-well plates (Falcon), allowed to adhere for >24 h, and transfected with 10 μg of plasmid and 30 μl of Lipofectamine 2000 in OptiMEM (both from Life Technologies) according to the manufacturers' instructions. After 24 h, RNA was harvested with RNAzol B (Tel-Test, Friendswood, TX) according to the manufacturer's instructions and quantified on a DU-640 spectrophotometer (Beckman Coulter).

RT-PCR and Analysis by Capillary Electrophoresis. cDNA was synthesized from 5 μg of RNA by using oligo(dT) (Roche Molecular Biochemicals) and SuperScript II reverse transcriptase (RT; Life Technologies) according to the manufacturers' instructions. cDNA was synthesized for 50 min at 42°C, followed by RT inactivation at 70°C for 15 min. One microliter of the cDNA preparation was amplified under the following conditions: 20 mM Tris·HCl, 50 mM KCl, 1 mM MgCl2, 200 μM each dNTP, 1 mM DTT, and 1 U Taq polymerase per 50 μl of reaction (Life Technologies) with an initial denaturation at 95°C for 5 min; 30 cycles of 95°C for 30 sec, 55°C for 30 sec, and 72°C for 30 sec; and an extension at 72°C for 7 min. The primer pair used for RT-PCR was designed to detect only transcripts derived from our minigene: OAT-specific forward primer E4S, 5′-GTGCTGTCAACCAAGGGC-3′ and CFTR-specific reverse primer ex10r, 5′-CAGCTGCAGTGCCAGGCATAATCCAGG-3′, fluorescently tagged at the 5′ end with 6FAM dye. PCR products were sized and quantified by capillary electrophoresis on an ABI Prism 310 genetic analyzer (Applied Biosystems) using peak area measurements.

Data Management and Statistical Analysis. Means, standard deviations, and graphs were generated with Microsoft excel. Figures were composed by using Microsoft powerpoint. Statistical analyses were performed by using jmp version 3.2.2 (SAS Institute, Cary, NC). For each pair and for comparisons between groups, ANOVA was performed. P < 0.05 was considered significant. Values were corrected for multiple comparisons by using the Bonferroni method.

Results

The CFTR Minigene Replicates in Vivo Exon Skipping Associated with the Length of a TG Repeat. To study the effect of variation in the number of TG dinucleotides adjacent to the 5T allele on splicing of CFTR RNA transcripts, we quantified relative levels of splice products from a CFTR minigene (13). The minigene contained CFTR exon 9 and portions of exons 8 and 10, with intronic sequences flanking each exon, fused in-frame to an OAT cDNA in a mammalian expression vector (Fig. 1A). After expression of the minigene in HEK293 cells, the relative proportions of RNA splice products were determined by using fluorescent RT-PCR and capillary electrophoresis. To examine whether variation in the number of TG dinucleotides affected splicing of minigene RNA transcripts, splicing efficiency of minigenes that contained 8, 9, 10, 11, 12, 13, and 24 TG dinucleotides [(TG)8–13 and (TG)24, respectively] was assessed. Exon 9 splicing efficiency decreased as the number of repeats increased [Fig. 1B; (TG)8 = 100 ± 0.0% exon 9+ transcripts, n = 12; (TG)9 = 95.4 ± 0.4%, n = 12; (TG)10 = 89.9 ± 1.1%, n = 8; (TG)11 = 81.7 ± 1.5%, n = 9; (TG)12 = 60.8 ± 0.7%, n = 12; (TG)13 = 54.0 ± 0.2%, n = 6; (TG)24 = 8.8 ± 0.3%, n = 6]. These results are consistent with observations made in cultured nasal epithelial cells (11) and in a CFTR minigene construct similar to ours (14). Furthermore, they support the idea that natural genetic variation in this TG tract in humans influences exon 9 splicing efficiency in vivo. We concluded from these data that our CFTR minigene was a reasonable system with which to study mechanisms underlying the correlation between TG repeat number and exon 9 splicing efficiency.

Fig. 1.

Fig. 1.

Minigene design and splicing of TG tract variants. (A) The CFTR locus in humans has a variable number (913) of TG repeats followed by a polythymidine tract of 5, 7, or 9 Ts. All minigenes in this study contained 5 Ts because this allele is associated with the highest levels of exon-skipping in vivo and is associated with clinical phenotypes. (TG)9 and (TG)10 have only been observed with 9T or 7T. The minigene consists of portions of CFTR exons 8, 9, and 10 and flanking intronic sequences fused in-frame to an OAT cDNA construct in a mammalian expression vector. (B) Exon 9 splicing efficiency decreases as the number of TG repeats increases. The bracket indicates TG alleles that are found in cis with 5T in humans.

Substitution of the TG Tract Reduces Splicing Efficiency. To determine whether the correlation between tract length and splicing efficiency was specific to the TG repeat, the TG tract was replaced with one of three different randomized tracts of 24 nucleotides each (N24A, -B, and -C). In each case, a drastic reduction in exon 9 splicing efficiency was observed (Fig. 2A; (TG)12 = 60.8 ± 0.7%, n = 6; N24A = 3.5 ± 2.7%, n = 6; N24B = 5.0 ± 0.6%, n = 8; N24C = 2.2 ± 0.1%, n = 8). To examine the relative effect of tract length, three additional random tracts of 16 nucleotides each (N16A, -B, and -C) (corresponding in length to eight TG repeats) were tested. Again, substantial decreases in exon 9 splicing efficiency were observed [Fig. 2B; (TG)8 = 100.0 ± 0.0%, n = 6; N16A ± 16.2 ± 1.7%, n = 6; N16B = 37.0 ± 0.9%, n = 6; N16C = 20.8 ± 1.0%, n = 6]. Deletion of the entire TG tract, (TG)0, resulted in relatively efficient splicing, although not as efficient as (TG)8 [Fig. 2C; (TG)0 = 84.5 ± 4.3%, n = 9]. Furthermore, the (TG)0 construct generated unique aberrant splice products from the use of cryptic splice sites. Taken together, these results indicated that the TG tract exerted a substantial, length-dependent, and positive influence on the splicing of CFTR exon 9. However, although TG tracts positively influenced CFTR exon 9 splicing, higher numbers of repeats abrogated this effect by means of the mechanism we investigated next.

Fig. 2.

Fig. 2.

Replacement of (TG)n tract with randomized tracts of the same total length results in drastically reduced splicing efficiency. (A) Splicing results from (TG)12 and three different random tracts of 24 nucleotides (for randomization, see Materials and Methods). (B) A TG tract of eight dinucleotides, which does not skip exon 9 at all, was replaced with three random-nucleotide stretches of equal length to the (TG)8 tract. (C) Content and relative proportions of splice products derived from the (TG)0 construct. An intermediate number of TGs (between 0 and 9) are optimal for exon 9 splicing.

Splicing Efficiency Correlates with Predicted RNA Secondary Structure. Our next objective was to explore the mechanism by which the TG tract influenced splicing efficiency. We considered the possibility that the corresponding RNA UG tract binds a protein that enhances the inclusion of exon 9. However, this mechanism is not consistent with the observation that splicing efficiency decreases when more UG dinucleotides are available for binding. Alternatively, because U and G can base-pair with one another in RNA, we considered whether the UG tract forms a secondary structure that affects splicing (15). To test this hypothesis, we replaced the (TG)12 tract with other repeat sequences, two of which could self-base-pair, (CG)12 and (TA)12, and two of which could not, (CA)12 and A24. All of the replacement tracts resulted in lower exon 9 splicing efficiencies than (TG)12 [Fig. 3A; (TG)12 = 60.8 ± 0.7%, n = 12; (CG)12 = 29.4 ± 1.1%, n = 6; (TA)12 = 31.9 ± 1.4%, n = 6; (CA)12 = 0 ± 0.0%, n = 6; A24 = 0.7 ± 1.1%, n = 9]. However, as predicted by our hypothesis, tracts bearing dinucleotides that could self-base-pair spliced significantly more efficiently than those that could not [compare (CG)12 and (TA)12 with (CA)12 and A24 in Fig. 3A; P < 0.0001]. These results supported the concept that a secondary structure involving nucleotides that can self-base-pair was formed and affected splicing efficiency.

Fig. 3.

Fig. 3.

Tracts capable of self-base-pairing splice exon 9 more efficiently. (A)Replacing the TG tract with dinucleotide repeats of different composition resulted in an inverse correlation between the ability to self-pair and exon 9 skipping. (CG)12 and (TA)12 tracts showed a level of skipping intermediate between (TG)12 and random sequences, whereas (CA)12 and A24 tracts skipped exon 9 100% of the time. The fact that sequences capable of intratract base-pairing spliced more efficiently than those which were not suggested a role for RNA secondary structure in the influence of these sequences on splicing. (B) Effect of replacing (TG)12 with artificial, thermostable hairpin elements of the same length as (TG)12. (TG)12 is shown for reference. Two artificially created hairpins, pin 1 and pin 2, contained the loop-forming motif TTCG flanked by randomized but complementary sequences (see Table 1 for sequences). (TG)5-TTCG-(TG)5 was generated by a 2-bp mutation in the (TG)12 construct.

To determine whether our interpretation of the results in Fig. 3A was correct, we replaced the TG tract with two synthetic sequences (pin 1 and pin 2) containing the tetranucleotide TTCG loop motif that has been shown to facilitate the formation of a thermodynamically stable RNA hairpin (16, 17). In each case, the TTCG motif was flanked by randomized but complementary nucleotides that could base-pair (Table 1). As predicted, minigenes containing pin 1 or pin 2 generated higher proportions of exon 9+ transcripts than did minigenes containing self-base-pairing (TA)12 and (CG)12 tracts (Fig. 3B; pin 1 = 35.4 ± 1.6%, n = 8; pin 2 = 46.4 ± 1.2%, n = 8). Furthermore, we noted that the TTCG loop motif could be introduced to the middle of a (TG)12 tract by mutation of two nucleotides (GT to TC) in the middle of the tract. A minigene with this modification [(TG)5TTCG(TG)5] produced the highest splicing efficiency of any 24-nucleotide tract tested [Fig. 3B; (TG)5TTCG(TG)5 = 87.4 ± 0.5%, n = 6]. These data suggested that the TG tract affected splicing by self-base-pairing to form an RNA hairpin.

Splicing Efficiency Correlates Inversely with Thermodynamic Stability of Predicted Structures. The question remained as to why different hairpin-forming sequences achieved different splicing efficiencies. Based on evidence that the thermodynamic stability of an RNA hairpin within an intron can affect its influence on the splicing reaction (18, 19), we submitted the sequences of the tracts plus 10 nucleotides of flanking sequence on either side to the RNA structure prediction program mfold. Comparison of predicted thermodynamic stability with observed splicing efficiency revealed an inverse relationship. There was a remarkably strong correlation between the predicted stability and splicing efficiency of TG tracts in the (TG)8 to (TG)24 range (R2 = 0.96, P < 0.0001, y = -28.0x + 103.7). We reasoned that if the thermodynamic stability of the RNA structure created by various TG tracts correlated with splicing efficiency, then a different RNA tract that can self-base-pair should also affect splicing in a length-dependent manner. To test this idea, TA dinucleotide tracts of 6, 7, 8, 11, and 12 repeats [(TA)6–8, (TA)11, and (TA)12] were inserted into the minigene and assayed for splicing activity [Fig. 4A; (TA)6 = 83.5 ± 1.4%, n = 12; (TA)7 = 77.1 ± 2.4%, n = 12; (TA)8 = 72.0 ± 2.2%, n = 12; (TA)11 = 52.0 ± 2.0%, n = 11; (TA)12 = 44.2 ± 3.5%, n = 12]. When the TA tracts and flanking sequences were submitted to mfold, splicing efficiency of the TA tracts also demonstrated a strong inverse correlation with predicted stability (R2 = 0.99, P < 0.005, y = -6.9x + 89.7). Finally, we plotted predicted stability transformed to log scale against splicing efficiency and found the two variables to be inversely correlated for all studied tracts that can self-base-pair [Fig. 4B; R2 = 0.77, P < 0.0001, y = -13.7Ln(x) + 72.9].

Fig. 4.

Fig. 4.

Splicing efficiency inversely correlated with stem loop thermostability across a broad range of minigene constructs. (A) Changes in repeat number of a TA dinucleotide tract produced the same trend seen with a TG tract. (B) Structures and thermodynamic stabilities of all tracts expected to form hairpin secondary structures, as predicted by the RNA structure-predicting program mfold (34). The oval indicates alleles seen in cis with 5T in humans. **, Construct (TG)5TTCG(TG)5, shown here for visual clarity. (CA)12, A24, and randomized tracts (N24 and N16) were not predicted to form hairpins or were predicted to form alternative structures and are not included in the plot.

Discussion

Dinucleotide repeats are abundant in the human genome (2022). For exons that have suboptimal splicing signals, such as CFTR exon 9 (13), variation in a local dinucleotide repeat has the potential to become an important factor in determining splicing efficiency. Variation in dinucleotide repeats can affect splicing by three mechanisms. First, a change in repeat number can alter the distance between nearby cis sequences, affecting interactions between splicing signals and the spliceosome. Second, dinucleotide repeats can bind protein factors that alter RNA transcript processing. Theoretically, the binding affinity of these factors would change with repeat number and could be positive or negative. Third, dinucleotide repeats can form secondary structures, resulting in the sequestration of splicing signals, the generation of new structures for protein recognition, or interference with existing RNA–RNA, RNA–protein, or protein–protein interactions. Previous studies have suggested that the first two mechanisms are responsible for the effect of the TG dinucleotide tract on CFTR exon 9 splicing efficiency (12, 23, 24). This work favors the third explanation.

Cuppens et al. (11) suggested that a greater number of TG repeats moves the putative branch point A into a position that is unfavorable for splicing. Although we observed an increase in splicing efficiency when the TG tract was shortened [e.g., (TG)12 vs. (TG)8], these differences were small compared with the very different splicing efficiencies of dinucleotide tracts composed of the same number of repeats [e.g., (TG)12 vs. (TA)12 vs. (CG)12 vs. (CA)12]. Thus, composition of the tract seemed to be more important than length of the tract. Furthermore, we mutated four candidate branch points in our minigene construct and found that none were critical to the splicing reaction (data not shown), indicating that alternative branch points can be used by the splicing machinery, as reported for many eukaryotic genes (2527). It is therefore unlikely that differences in the precise location of the branch point, as determined by the length of the TG tract, is the mechanism underlying variation in splicing efficiency. It also has been proposed that the TG tract is the binding target for a transacting splicing repressor. The TAR DNA-binding protein TDP-43 purportedly binds UG ribonucleotides in CFTR RNA transcripts and interferes with splicing in a repeat number-dependent manner (24). Increased expression of TDP-43 was reported to reduce CFTR exon 9 splicing efficiency, and antisense inhibition of TDP-43 was reported to increase exon 9 splicing (23, 24). However, that model does not explain the loss of exon 9 splicing we observed on replacement of the TG tract with random sequence. Furthermore, we showed that TA tracts demonstrate the same correlation between splicing efficiency and length variation despite having lower splicing efficiencies than TG tracts. It could be argued that the HEK293 cells used here may not express RNA-splicing repressors such as TDP-43. However, splicing of minigenes expressed in these cells replicated the association between TG tract length and exon 9 splicing efficiency observed in other cells and in vivo (11, 14, 23). These observations indicate that the mechanism underlying the association between TG tract length and splicing efficiency does not involve repression of splicing due to the presence of a TG-specific binding protein.

If the TG tract were a genuine splicing enhancer, one would expect decreased splicing efficiency when the tract was absent. Indeed, transcripts derived from the (TG)0 construct spliced exon 9 less efficiently than those from (TG)8, indicating that some number of TG repeats enhanced splicing. However, splicing efficiency of the (TG)0 construct was higher than that observed with longer TG tracts, such as (TG)12. Because (TG)0 was the only construct that used cryptic splice sites, we hypothesized that the splicing process was fundamentally different for the (TG)0 construct. Inspection of the 3′ splice site in the (TG)0 construct revealed that the abbreviated 5T polythymidine tract was effectively extended to TTTTATTTTTGATTTTT. We suspect that this interrupted but longer polythymidine tract may have compensated for the absence of the TG tract.

There is ample support for the idea that RNA secondary structure can affect splicing. Secondary structures that sequester splicing signals in yeast can inhibit the splicing machinery and affect splice site choice (28). In the chicken β-tropomyosin gene, an alternative exon is spliced depending on the level of splicing factors available to bind hairpins formed by the RNA (29). An intron of the adenovirus E1A pre-mRNA contains a hairpin structure required for efficient splicing (18). The hairpin structure enhances splicing by positioning unusually distant (>50 bp away) branch points closer to the 3′ splice site (18). This example has interesting parallels to the model we propose for the CFTR gene. In both situations, the proposed hairpin occurs between the branch point and the exon, distant branch points are involved, and hairpin-forming sequences enhance splicing of the downstream exon. Correspondingly we hypothesize that the hairpin structure facilitates exon 9 splicing by bringing two pyrimidine tracts into proximity, similar to the situation postulated for the (TG)0 construct above. Introduction of the RNA-folding motif TTCG to the (TG)12 tract [to form (TG)5TTCG(TG)5] substantially increased its splicing efficiency, suggesting that proclivity to hairpin formation is an important factor. An additional feature of the CFTR example is that the stability of the secondary structure is important. Hairpins with higher predicted stability were associated with lower splicing efficiency. A similar relationship between stability of an RNA hairpin and splicing efficiency has recently been observed in intron 1 of the Drosophila alcohol dehydrogenase (Adh) gene (19). An evolutionarily conserved RNA hairpin in the vicinity of the 3′ splice site enhanced splicing, but mutations that increased the predicted stability of the hairpin reduced splicing efficiency (19). These studies suggest that RNA hairpin structures can assist the splicing process but that their presence must be transient to facilitate optimal splicing.

Although most work correlating secondary structure and splicing has been performed by using model organisms, there is at least one example in humans. A conserved secondary structure involving the 5′ splice site of exon 10 of the Tau gene in humans is responsible for regulating alternative splicing. When this secondary structure is disturbed by mutation, the resultant shift in splicing disrupts a balance of protein isoforms, resulting in the development of frontotemporal dementia and parkinsonism (30, 31). Triplet repeat expansions, involved in the pathogenesis of multiple neurological disorders, also have been shown to form hairpin structures, although the role of such structures in RNA splicing has not been determined (32, 33). When we examined the splicing data generated here, we noticed a particularly marked reduction in the splicing efficiency of minigenes containing either (TG)12 or (TG)13 compared with the (TG)11 minigene (circled diamonds in Fig. 4B). This observation correlates with the high risk of congenital bilateral absence of the vas deferens or nonclassic cystic fibrosis associated with (TG)12 and (TG)13 tracts and the low risk associated with the (TG)11 tract (11, 12). Thus, this work may provide a molecular explanation for the different phenotypes associated with specific TG tract-length variants in the CFTR gene. Finally, our results suggest that dinucleotide repeats within other genes should be considered as potential modifiers of RNA splicing and phenotype.

Acknowledgments

We thank Drs. Iain McIntosh and Hal Dietz for helpful ideas and stimulating discussion; Drs. Dave Valle, Hal Dietz, and Ray Kendzior for supplying the OAT cDNA construct; Kent J. Rohleder for creating geneerator software; and Rita McWilliams for statistical assistance. This work was supported by National Institutes of Health Grants DK44003 and HL68927.

Footnotes

Osborne, L. R., Alton, E. W. F. W., and Tsui, L. C. (1994) Pediatr. Pulmonol. 214, Suppl., p. 10 (abstr.).

References


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES