Abstract
Background
More than two thirds of the highly expressed ribosomal protein (RP) genes in Saccharomyces cerevisiae contain introns, which is in sharp contrast to the genome-wide five percent intron-containing genes. It is well established that introns carry regulatory sequences and that the transcription of RP genes is extensively and coordinately regulated. Here we test the hypotheses that introns are innately associated with heavily transcribed genes and that introns of RP genes contribute regulatory TF binding sequences. Moreover, we investigate whether promoter features are significantly different between intron-containing and intronless RP genes.
Results
We find that directly measured transcription rates tend to be lower for intron-containing compared to intronless RP genes. We do not observe any specifically enriched sequence motifs in the introns of RP genes other than those of the branch point and the two splice sites. Comparing the promoters of intron-containing and intronless RP genes, we detect differences in number and position of Rap1-binding and IFHL motifs. Moreover, the analysis of the length distribution and the folding free energies suggest that, at least in a sub-population of RP genes, the 5' untranslated sequences are optimized for regulatory function.
Conclusion
Our results argue against the direct involvement of introns in the regulation of transcription of highly expressed genes. Moreover, systematic differences in motif distributions suggest that RP transcription factors may act differently on intron-containing and intronless gene promoters. Thus, our findings contribute to the decoding of the RP promoter architecture and may fuel the discussion on the evolution of introns.
Findings
Background
Hypothesis and Work Plan
In this study, we investigate three hypotheses. First, introns are innately associated with heavily transcribed genes [1]. Second, introns of RP genes carry regulatory TF binding motifs [2-4]. And third, promoter features like Rap1 binding sites or the GC base profile are significantly different between intron-containing and intronless RP genes [5]. To this end, we construct three promoter sets of intron-containing and intronless RP genes, and of non-RP lowly expressed intron-containing genes [6-10]. We compare mRNA expression levels, transcription rates, 5'UTRs, and base compositions around the TSS. Additionally, we scan the promoter sequences for potential binding sites of several transcription factors and investigate their frequencies and localizations relative to the TSS. Finally, we test effects of identified promoter features on RP gene expression by linear regression analysis. For a further background we refer to Additional file 1.
Gene Structure, Expression and Transcription Rate
The average mRNA abundance of intron-containing and intronless RP genes is not significantly different when measured with SAGE (Fig. 1B, F-test p-value: 0.166) [see Materials and Methods in Additional file 1]. We only present the results for the SAGE data set but the analysis of microarray data sets leads to qualitatively similar results. Note that we use the terms mRNA abundance and expression level synonymously to designate the mRNA level averaged over large numbers of cells in different experimental conditions. In contrast, the transcription rate, which was measured directly by a genome-wide transcription run-on assay [11], is systematically higher in intronless RP genes in yeast cells recovering from a glucose-galactose shift (F-test p-value: 1.479e-09; Fig. 1C, D) [see Methods in Additional file 1]. This may reflect the additional costs for transcribing the intronic sequence and splicing of the pre-mRNA. And one could speculate that mRNA stability counter balances the faster mRNA production of intronless RP genes. Note that the average mRNA abundance is a summary value of yeast cells in different states, which are not compatible with the conditions of the transcription run-on assay. We conclude that introns are not necessary for RP genes, and hence for yeast genes in general, to be highly expressed. Note that these data also show very nicely the concerted induction of transcription of virtually all RP genes compared to most other genes at six hours after the glucose to galactose shifting (Fig. 1C).
In order to obtain more accurate information about the positioning of potential regulatory motifs, we incorporate transcription start site (TSS) predictions derived from 5'SAGE experiments [12]. In a recent study, TSSs were determined for the majority of yeast genes and there is good concordance between the results of the two studies [[9,12], see Additional file 1]. We use the predictions of the 5'SAGE study throughout this work. For 90 of the 100 intron-containing and 33 of the 37 intronless RP genes, we find TSS predictions in this data set. We restrict further analyses to this subset of 123 genes. Traditionally, for the study of the relative localization of transcription factor binding sites (TFBS), the translation start codon ATG is taken as a surrogate for the TSS, which can be rather inaccurate especially for genes that contain an intron in their 5'UTR (leader intron). We select an additional set of 35 lowly expressed intron-containing genes that are also present in the 5'SAGE data set in order to contrast our results for the RP genes [[13], Fig. 1B, see Additional file 2].
We estimate the lengths of the 5'UTRs as distance between the translation start codon (ATG) and the predicted TSS from the 5'SAGE experiments, excluding introns. There is no strong dependence of the UTR length on the mRNA expression level, although the three most highly expressed genes, RPL38, RPL41A and RPL41B have short UTRs (Fig. 2). Although the 5'UTRs of intronless RP genes are significantly longer, the distributions are not separated or, in other words, some intron-containing genes also have relatively long 5'UTRs (F-test p-value: 0.00761). The most pronounced difference is observed for the longest 5'UTRs, which may form a special group. To investigate this further, we used the Vienna package to compute the folding free energy (ΔG) of the first 50 bases of each RP mRNA including the 5'UTR [14]. Among the seven intronless genes with longest 5'UTR sequences, we find the five most stable secondary structure elements, which suggests a role in the regulation of translation [15]. Moreover, we find a significant negative correlation between the folding free energy of the 5'-UTR and the mRNA abundance (correlation coefficient: -0.3, p-value: 0.00205) but not the transcription rate (correlation coefficient: 0.06, p-value: 0.5605).
Distribution of Rap1 Binding Motifs
For factors that are known to regulate RP gene transcription and those that have been predicted by genome-scale experiments, we select position weight matrices (PWM) to represent the binding specificity and scan the region from 600 bp upstream of the TSS to 600 bp downstream of all the genes of our three sets for potential binding sites using T-Reg [see Additional files 1, 3, 4, 5, 6, 7, 8, 9 for methods and for more findings].
We scan our promoter sets for Rap1 binding motifs similar to Lascaris and colleagues using the PWM MR2 with consensus string WACAYCCRTAACATY [16]. As the general findings regarding position and orientation of Rap1 binding motifs in RP promoters are confirmed by our analysis, we focus on differences between intron-containing and intronless genes. We predict potential binding sites in all intron-containing genes except RPS22B. In contrast, we did not detect Rap1 sites in six of the 33 intronless genes (see tables in Additional file 1). Moreover, although in both sets of genes the binding sites are located mainly in the expected region between positions -500 and -160, there are characteristic differences. In intronless RP genes, the Rap1 sites occur narrowly distributed around position -220 (Fig. 3a). In intron-containing genes, the sites are distributed over a broader range, mainly between positions -380 and -300, which is further upstream of the TSS (Fig. 3a). Former studies have demonstrated that Rapl binding sites mostly occur in pairs and in a preferred orientation [16,17]. According to our analysis, 74 of 90 intron-containing genes with predicted TSS have pairs of Rap1 sites, of which 64 are spaced less than 30 bp (Tab. 1 in Additional file 1, Fig. 3b). By spacing, we mean the number of bases in between the two sites. In one gene, the sites are 88 bp apart and in ten genes, more than 100 bp, which we consider as abnormal pairs. The preferential spacing is two to six bp. Most commonly, the two Rap1 sites occur in tandem according to the consensus given above and second most in head-to-head orientation, which together account for 95% of the cases (Fig. 3b). In this aspect, there are no big differences between intron-containing and intronless RP genes (Fig. 3c). In the whole RP promoter set, we identify 13 duplicate Rap1 sites with short spacing (<30 bp) and single Rap1 sites in three genes, in addition to the findings of Lascaris and colleagues [16]. Because the newly identified Rap1 sites occur in the proper location and orientation, we are confident that our T-Reg method produces specific but sensitive predictions.
Distribution of IFHL
In contrast to Fhl1 and Sfp1 motifs (see Additional file 1 for details), the IFHL motif occurs quite differently in the two RP promoter sets and is barely present in promoters of lowly transcribed genes (Additional file 9). We identify 69 instances in 40 intron-containing genes between positions -400 and -150 (Additional file 1, 9). Nine intronless genes contain IFHL motifs, five of which are in upstream promoter regions comparable to intron-containing genes (position -400 to -150). The IFHL motif is preferentially located downstream of the Rap1 sites within a distance of 50 bp. Sometimes the two motifs overlap. This is in accordance with previous results [18]. Furthermore, the IFHL motifs in the upstream region of 24 intron-containing genes occur in duplicate within a distance of less than 100 bp (Tab. 1 in Additional file 1). Other than the mentioned differences in the positioning of the Rap1 motifs, the distribution of IFHL displays the most pronounced differences between intron-containing and intronless RP genes (Chi-squared test p-value: 0.04124).
Conclusion
Two findings of our analysis argue against the direct involvement of introns in the regulation of transcription of the highly expressed group of ribosomal protein genes. First, we show that introns are not necessary for RP genes in yeast to be heavily transcribed. Second, introns of RP genes are not enriched in binding motifs of known or putative RP transcription factors. Furthermore, we test the effect of promoter features on expression level and transcription rate by linear regression analysis. This is important because at present we cannot explain the large variety of transcription rates and of expression levels of the highly and coordinately expressed RP genes. We find that the most significant features are, for the transcription rate, the presence of introns and for the expression level, the folding free energy of the 5'-terminal sequence. Our results help to decipher the RP promoter architecture towards a prediction of transcription rates based on the presence and strength of sequence features.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
JZ carried out many of the analyses and prepared data for publication. MV provided initial ideas and concepts and helped writing the manuscript. SR carried out analyses and wrote the manuscript.
Supplementary Material
Acknowledgments
Acknowledgements
J. Zhang acknowledges support received from the National Natural Science Foundation of China (30360027).
Contributor Information
Jing Zhang, Email: zhangjing@ynu.edu.cn.
Martin Vingron, Email: vingron@molgen.mpg.de.
Stefan Roepcke, Email: stefan.roepcke@nycomed.com.
References
- Warner JR, Vilardell J, Sohn JH. Economics of ribosome biosynthesis. Cold Spring Harb Symp Quant Biol. 2001;66:567–574. doi: 10.1101/sqb.2001.66.567. [DOI] [PubMed] [Google Scholar]
- Bhattacharyya N, Banerjee D. Transcriptional regulatory sequences within the first intron of the chicken apolipoproteinAI (apoAI) gene. Gene. 1999;234:371–380. doi: 10.1016/S0378-1119(99)00183-3. [DOI] [PubMed] [Google Scholar]
- Chen J, Hayes P, Roy K, Sirotnak FM. Two promoters regulate transcription of the mouse folylpolyglutamate synthetase gene three tightly clustered Sp1 sites within the first intron markedly enhance activity of promoter B. Gene. 2000;242:257–264. doi: 10.1016/S0378-1119(99)00507-7. [DOI] [PubMed] [Google Scholar]
- Wenz P, Schwank S, Hoja U, Schuller HJ. A downstream regulatory element located within the coding sequence mediates autoregulated expression of the yeast fatty acid synthase gene FAS2 by the FAS1 gene product. Nucleic Acids Res. 2001;29:4625–4632. doi: 10.1093/nar/29.22.4625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lascaris RF, Groot E, Hoen PB, Mager WH, Planta RJ. Different roles for abf1p and a T-rich promoter element in nucleosome organization of the yeast RPS28A gene. Nucleic Acids Res. 2000;28:1390–1396. doi: 10.1093/nar/28.6.1390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clark TA, Sugnet CW, Ares M., Jr Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science. 2002;296:907–910. doi: 10.1126/science.1069415. [DOI] [PubMed] [Google Scholar]
- Spingola M, Grate L, Haussler D, Ares M., Jr Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. Rna. 1999;5:221–234. doi: 10.1017/S1355838299981682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Planta RJ, Mager WH. The list of cytoplasmic ribosomal proteins of Saccharomyces cerevisiae. Yeast. 1998;14:471–477. doi: 10.1002/(SICI)1097-0061(19980330)14:5<471::AID-YEA241>3.0.CO;2-U. [DOI] [PubMed] [Google Scholar]
- Miura F, Kawaguchi N, Sese J, Toyoda A, Hattori M, Morishita S, Ito T. A large-scale full-length cDNA analysis to explore the budding yeast transcriptome. Proc Natl Acad Sci USA. 2006;103:17846–17851. doi: 10.1073/pnas.0605645103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nakao A, Yoshihama M, Kenmochi N. RPG: the Ribosomal Protein Gene database. Nucleic Acids Res. 2004:D168–170. doi: 10.1093/nar/gkh004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia-Martinez J, Aranda A, Perez-Ortin JE. Genomic run-on evaluates transcription rates for all yeast genes and identifies gene regulatory mechanisms. Mol Cell. 2004;15:303–313. doi: 10.1016/j.molcel.2004.06.004. [DOI] [PubMed] [Google Scholar]
- Zhang Z, Dietrich FS. Mapping of transcription start sites in Saccharomyces cerevisiae using 5' SAGE. Nucleic Acids Res. 2005;33:2838–2851. doi: 10.1093/nar/gki583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J, Hu J, Shi XF, Cao H, Liu WB. Detection of potential positive regulatory motifs of transcription in yeast introns by comparative analysis of oligonucleotide frequencies. Comput Biol Chem. 2003;27:497–506. doi: 10.1016/j.compbiolchem.2003.09.005. [DOI] [PubMed] [Google Scholar]
- Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31:3429–3431. doi: 10.1093/nar/gkg599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ringner M, Krogh M. Folding free energies of 5'-UTRs impact post-transcriptional regulation on a genomic scale in yeast. PLoS Comput Biol. 2005;1:e72. doi: 10.1371/journal.pcbi.0010072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lascaris RF, Mager WH, Planta RJ. DNA-binding requirements of the yeast protein Rap1p as selected in silico from ribosomal protein gene promoter sequences. Bioinformatics. 1999;15:267–277. doi: 10.1093/bioinformatics/15.4.267. [DOI] [PubMed] [Google Scholar]
- Beer MA, Tavazoie S. Predicting gene expression from sequence. Cell. 2004;117:185–198. doi: 10.1016/S0092-8674(04)00304-6. [DOI] [PubMed] [Google Scholar]
- Wade JT, Hall DB, Struhl K. The transcription factor Ifh1 is a key regulator of yeast ribosomal protein genes. Nature. 2004;432:1054–1058. doi: 10.1038/nature03175. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.