Abstract
The core promoter, which immediately flanks the transcription start site (TSS), plays a critical role in transcriptional regulation of eukaryotes. Recent studies on higher eukaryotes have revealed an unprecedented complexity of core promoter structures that underscores diverse regulatory mechanisms of gene expression. For unicellular eukaryotes, however, the structures of core promoters have not been investigated in detail. As an important model organism, Schizosaccharomyces pombe still lacks the precise annotation for TSSs, thus hampering the analysis of core promoter structures and their relationship to higher eukaryotes. Here we used a deep sequencing-based approach (DeepCAGE) to generate 16 million uniquely mapped tags, corresponding to 93,736 positions in the S. pombe genome. The high-resolution TSS landscape enabled identification of over 8,000 core promoters, characterization of 4 promoter classes and observation of widespread alternative promoters. The landscape also allowed precise determination of the representative TSSs within core promoters, thus redefining the 5' UTR for 82.8% of S. pombe genes. We further identified the consensus initiator (Inr) sequence – PyPyPuN(A/C)(C/A), the TATA-enriched region (between position −25 and −37) and an Inr immediate downstream motif – CC(T/A)(T/C)(T/C/A)(A/G)CCA(A/T/C), all of which were associated with highly expressed promoters. In conclusion, the detailed analysis of core promoters not only significantly improves the genome annotation of S. pombe, but also reveals that this unicellular eukaryote shares a highly similar organization in the core promoters with higher eukaryotes. These findings lend additional evidence for the power of this model system in delineating complex regulatory processes in multicellular organisms, despite its perceived simplicity.
Keywords: core promoter structure, fission yeast, sequence motif analysis, TSS profiling, 5' UTR annotation
Abbreviations
- CAGE
cap analysis of gene expression
- DeepCAGE
deep sequencing-based cap analysis of gene expression
- TSS
transcription start site
- Inr
initiator
- TC
tag cluster
- ORF
open reading frame
- CDS
coding sequence
- GO
Gene Ontology
- SP
single dominant peak
- DP
broad with a single dominant peak
- MP
broad with bi- or multi- peaks
- GB
generally broad distribution
- FDR
false discovery rate
- LUSP
local-ultra-sharp peak
- LDP
local-distributed peak
Introduction
Core promoters of eukaryotes are critical DNA regions in the immediate vicinity of transcription start sites (TSSs) of genes and play a central role in the regulation of transcriptional initiation.1-5 Although the core promoter structure was originally thought to be simple, recent studies have revealed an unprecedented complexity of its structure in higher eukaryotes, such as distinct sequence motifs associated with ubiquitously expressed genes, diverse TSS distribution patterns within promoters, complex local chromatin structures and epigenetic modifications around TSSs.5-13 This multifaceted complexity has contributed to the precise and dynamic control of the expression levels of various genes and is required for the accomplishment of developmental programs and the maintenance of distinct tissue types in multicellular organisms.5 Any alterations to the core promoters that could lead to functional deviations may also be an integral part of disease initiation and progression.14
To date, the core promoter structures have been extensively studied in mammals and Drosophila melanogaster.5,6,15 The combination of Cap Analysis of Gene Expression (CAGE) and next-generation sequencing has dramatically accelerated the accumulation of high-quality TSS datasets and enabled precise identification of the consensus sequences and the locations of core promoter elements (such as the initiator and the TATA box).6,15,16 These data have been used to generate high-resolution TSS distributions, based on which core promoters can be generally classified into 2 classes – sharp promoters and broad promoters.6 The TSSs in sharp promoters are narrowly distributed, where transcription primarily initiates at closely positioned neighboring nucleotides, and TSSs in the broad promoters are distributed over a relatively broad region. In mammalian cells, the sharp promoters are found to associate with the TATA box and strongly correlate with tissue-specific transcriptions, while the broad promoters appear to associate with CpG islands and correlate with ubiquitously expressed genes.5,6,9 Similar characteristics also seem to exist in Drosophila melanogaster.17
In contrast, for important unicellular eukaryotes, such as the budding yeast (Saccharomyces cerevisiae) and the fission yeast (Schizosaccharomyces pombe),18,19 the core promoter structures have not been comprehensively analyzed to date. For example, for the well-studied S. cerevisiae, global TSS profiling was performed with capillary sequencing-based approaches,20,21 whose low sequencing depth made it difficult to accurately portray the TSS distributions and to pinpoint the most frequently used TSSs within promoters. Owing to this limitation, many promoters could be incorrectly classified and, the consensus motif sequence and the motif location could not be accurately defined. In the case of S. pombe, the available TSS data are even less. Without a high-depth global TSS profiling, detailed analysis of the core promoter elements and the promoter classes could not be carried out. Given the importance of these model systems in a broad range of studies,22-27 the lack of information on their promoter analysis could limit our ability to utilize these simpler organisms in deciphering the relationship between the promoter structure and gene regulations.
In this study, we have used Deep sequencing-based Cap Analysis of Gene Expression (DeepCAGE) to profile the TSSs of S. pombe at single-nucleotide resolution on the genome-wide scale.6,15,16 By precisely portraying the TSS distribution patterns within over 8,000 core promoters, we located the representative TSSs for over 4,000 genes that are significantly different from the current annotation in the S. pombe database,28 thus, providing a substantial improvement over the current annotation. Based on the representative TSSs, we identified the consensus initiator (Inr) sequence and the TATA-enriched region shared by both protein-coding and non-coding genes in S. pombe. We also discovered a new motif located at the immediate downstream of the Inr whose existence was highly correlated with the expression levels of core promoters. Moreover, the core promoters in S. pombe were found to consist of 4 distinct classes, highly analogous to that observed in mammalian genomes.1 In addition, we found widespread alternative promoters in S. pombe. The structural and functional similarities between S. pombe and mammalian promoters suggest that S. pombe could be an ideal model system for facilitating our understanding of the biology of the various promoter classes in complex systems, due to the ease of S. pombe genetic manipulations.29
Results
Landscape of TSSs in the S. pombe genome
Deep sequencing of the CAGE library generated 20,115,604 CAGE tags, and these tags were mapped to the S. pombe genome (Ensembl Genomes fungi release-12) using Bowtie.30 With 2 mismatches allowed in Bowtie, 16,028,846 (79.7%) tags were uniquely mapped while 2,819,094 (14.0%) tags were mapped to multiple locations. Tags mapped to rRNA genes accounted for <8.5% of all CAGE tags, indicating the high quality of our CAGE library as the library was prepared without poly-A selection.6 Only the uniquely mapped tags were used for further analysis. Based on the gene annotations in PomBase (http://www.pombase.org),28 we found that the vast majority of tags were located in the annotated 5′ UTR with tag density more than 50 times higher than those in 3′ UTR, intron and the coding sequence (CDS) (Fig. 1A). All CAGE tags were mapped to a total of 93,736 genomic positions (covering <0.4% of the S. pombe genome), with >80% of tags mapped to only 2,610 positions (0.01% of the S. pombe genome; Fig. 1B). The number of single tags (singletons) accounted for <0.1% of all tags. These results demonstrated a significant aggregation of CAGE tags near the annotated 5′ ends of genes, consistent with previous CAGE studies in Metazoa, and allowed the demarcation of putative TSSs with CAGE tags.1,6,31 In addition, most tags were narrowly distributed between 100 nt upstream and 200 nt downstream of the 4,684 TSSs annotated in PomBase (Fig. 1C), showing an overall agreement between our CAGE data and the TSS annotations from previous publications.22,32,33
Figure 1.

Global identification of TSSs by DeepCAGE. (A) Distribution of CAGE tags in the S. pombe genome. The distribution was based on the annotations of 5,144 protein-coding genes in PomBase. (B) Relationship between the tag number and the genome coverage. The x-axis is the proportion of the genome covered by m tags, and the y-axis is the percentage of the m tags in all uniquely mapped tags. (C) Tag distribution around annotated TSS of S. pombe genes. The highest peak is located between position +7 and position +10, and the second highest peak is located around position +61. (D) Distribution of TCs in the S. pombe genome. The position of a TC was represented by the position of its representative tag. The distribution was calculated the same way as in (A). (E) A representative gene (SPBC337.11) for those with TCs identified within their CDS. (F) Distribution of TCs in CDS. The three distribution curves were based on TCs with tag number ≥ 0th (black), 50th (red), and 90th (green) percentile of tag numbers of all TCs in CDS. The distance to the CDS start site was normalized using the length of CDS in each gene. In (E) and (F), the solid red box stands for CDS and the light green box stands for 5′ UTR and 3′ UTR.
Tags highly enriched in a small genomic region were further clustered into tag clusters (TCs), with each TC corresponding to a putative core promoter. Totally, we identified 19,287 TCs, consisting of 16,003,576 tags (99.8% of all uniquely mapped tags). Using the highest-frequency tag within each TC as the representative tag, we calculated the distribution of TCs in the S. pombe genome (Fig. 1D). (We hereafter used the positions of representative tags to represent the positions of TCs.) Similar to human and Drosophila,15,17,31 a considerable fraction (˜26%) of TCs were identified in CDS (Fig. 1E). These TCs were more inclined to arise from regions close to the CDS start site, which was more prominent for the TCs with high tag counts (Fig. 1F; this phenomenon was also observed in the human CAGE data 31), and genes with at least 5 TCs in CDS had the most enriched function of “ribonucleotide binding” as found by GOEAST.34 Whether these TCs represent bona fide truncated peptides should be further examined, but a recent study in the budding yeast strongly implied such a possibility.35 The percentage of intron-derived TCs was almost negligible (˜1%) owing to the lack of intron in this organism in general. A small fraction (˜2%) of TCs were identified within the 3′ UTR, but it is unclear if they share similar functional roles with their counterparts in human and Drosophila.6,15 The average tag number of TCs differed substantially in 4 annotated features, with 1,846 tags for annotated 5′ UTR, 49 tags for 3' UTR, 111 tags for CDS and 33 tags for intron. There was a strong correlation (Spearman correlation = 0.77) between the number of tags and the width of a TC (Fig. S1).
TCs located in the annotated 5′ UTR (including its upstream 100 nt) were identified as core promoters as they were consistent with previous studies.22,32,33 With this criterion, 7,859 core promoters were identified for 4,041 genes (86.3% of 4,684 genes with annotated 5′ UTR). Comparing nucleosome positioning data36 with our CAGE data, we found a clear nucleosome-depleted region around the representative tags, suggesting that these 7,859 TCs were reliable core promoters36 (Fig. S2A). Since H3K4me3 is a well-established hallmark of transcription initiation,37 we further compared the ChIP-Seq data38 and found a clear H3K4me3-enriched region immediately downstream of the representative tags, which also supported the reliability of our result38 (Fig. S3A). For 460 genes without annotated 5′ UTR, TCs most adjacent to the CDS start sites were identified as core promoters if they were located within 1,000 nt (i.e., ∼95th percentile of the lengths of all annotated 5′ UTRs) upstream of the CDS start sites but not within annotated protein-coding gene bodies or ±100 nt flanking the annotated TSSs of non-coding RNAs (ncRNAs). With this criterion, we identified 220 core promoters for 220 genes (47.8% of the 460 genes), which were also supported by the nucleosome positioning data (Fig. S2B) and the histone modification data (Fig. S3B). The average expression level of the 220 core promoters was significantly lower than that of the 7,859 core promoters [p-value < 0.01 by Wilcoxon rank sum test; the expression level of a core promoter (or a TC) was measured by the number of CAGE tags within it], which partially explained why the 220 core promoters were not observed in previous studies due to the relatively low sequencing depth near the 5′ transcript ends.22,32,33
Redefine the 5′ UTR of S. pombe genes
The significant depth of our CAGE data provided an opportunity to verify and improve the annotation of 5′ UTR of S. pombe genes. To do this, we first compared the CAGE-derived TCs with the annotated TSSs of the 4,684 genes (91.1% of all annotated genes) in PomBase. Among these genes, 2,178 (46.5%) were covered by at least one TC located within ±20 nt flanking their annotated TSSs. Even if each singleton was considered as a valid TC, this coverage only went slightly up to 49.2%. However, when the full annotated 5′ UTR was used to compare with the CAGE-derived TCs, the number of TC-covered genes increased dramatically to 3,535 (75.5%). If a further 100 nt upstream was included, this number became 4,041 (86.3%). In addition, the majority of TCs were located within 75% length of the annotated 5′ UTR starting from the CDS start sites. These results demonstrate that a sizable fraction of the currently annotated TSSs should be revised. For this purpose, we use the CAGE data to redefine the representative TSS of each gene by the following criteria: (1) if a gene has only 1 core promoter, which was the case for the majority of S. pombe genes, the representative tag of this core promoter is defined as the representative TSS of the gene; (2) if a gene has at least 2 core promoters, the one with the highest expression level is defined as the primary promoter, and its representative tag is defined as the representative TSS; (3) if a gene has no core promoter identified by the CAGE data, the representative TSS is then unavailable. For the 4,684 genes with annotated 5′ UTR, we identified 4,041 representative TSSs for 4,041 genes (86.3%; Table S1). Only 27 of the 4,041 representative TSSs exactly matched the currently annotated TSSs in PomBase (Fig. 2A) and the average length of the representative-TSS-defined 5′ UTRs is 194 nt, much shorter than the 308 nt defined by the currently annotated TSSs (Fig. 2B and C). Such difference probably results from diverse experimental approaches used in each study. The current annotation was largely based on RNA-Seq data sets 22 that were supposed to annotate the longest 5′ UTR for a given gene (based on the RNA-Seq coverage of >100 times for 5′ UTR in ref. 22), thus 5′ UTRs of transcript isoforms could not be distinguished. Whereas our DeepCAGE data could yield a higher-resolution TSS distribution to distinguish transcript isoforms, which enabled the identification of the representative TSS and the redefinition of the 5′ UTR. For the 460 genes without annotated 5′ UTR, we identified 220 representative TSSs based on the core promoters of the 220 genes (Table S2). (The flowchart of our work is shown in Fig. 2D.) Taken together, these representative TSSs redefined 4,261 5′ UTRs for 82.8% of S. pombe genes.
Figure 2.

Comparison between DeepCAGE-derived representative TSS and annotated TSS. (A) Histogram of the distance between representative TSS and annotated TSS. The distance was calculated from the 4,041 genes in Table S1. The histogram peaked at +8 (nt) on the x-axis with 39 genes on the y-axis. 76.2% of representative TSSs were located within 100 nt flanking the annotated TSSs. (B) Boxplot of the lengths of 5′ UTRs defined by representative TSS and annotated TSS of the 4,041 genes. The median length of the 5′ UTRs (i.e., the black line in the middle of each box) is 178 nt for annotated TSS and 107 nt for representative TSS. (C) The correlation between representative-TSS-defined 5′ UTR lengths and annotated-TSS-defined 5′ UTR lengths. The red dashed line has a slope of 1 and an intercept of 0. (D) A flowchart briefly describing the steps to identify the representative TSS and to study the core promoter structure.
Classification of core promoters based on TSS distributions
With a method similar to that proposed in ref. 6, we classified the core promoters with at least 100 CAGE tags each into 4 different classes based on the shape of their TSS distributions: (i) single dominant peak (SP), (ii) broad with a single dominant peak (DP), (iii) broad with bi- or multi- peaks (MP), and (iv) generally broad distribution (GB) (Fig. 3). 1,574 (48.9%), 591 (18.4%), 746 (23.2%) and 309 (9.6%) core promoters, which corresponded to 1,495, 577, 723 and 303 genes respectively, were classified into SP, DP, MP and GB classes. In comparison with human,31 where SP class accounted for less than 5% of all promoters (data was reanalyzed under the same classification criteria), S. pombe had significant preference for SP promoters. This preference could be attributed to the lack of CpG islands in S. pombe, which was found significantly associated with broad promoters in mammals.5,6 Based on Gene Ontology (GO) analysis, genes with SP promoters were highly enriched in “ribosome biogenesis,” among other processes (Table S3), while genes with DP promoters were predominantly associated with chromatin remodeling and modifications. One of the genes with DP promoters is swi6, which encodes a major structural heterochromatin protein. This protein is involved in transcriptional silencing by binding chromatin domains with epigenetic methylation of histone H3. Swi6 binding dynamics was reported to be dependent on growth status.39 The broad pattern of swi6 core promoter provides a foundation to be dynamically regulated at transcription level under diverse growth conditions. Genes with MP and GB promoters did not have noticeable enrichment in any biological process of level 4 or higher in GO. In addition, we observed that 193 out of 258 genes (74.8%) used core promoters from different classes, suggesting that expression of the transcript isoforms could be regulated by different mechanisms.
Figure 3.

Representative promoters of 4 promoter classes. The SP class is characterized by a sharp peak (A) that represents the majority of tags in a TC. Nearly half of the identified core promoters belong to SP class. Those in GB class do not have a single strong peak (D) and account for ∼10% of core promoters of all 4 classes. Those in DP (B) and MP (C) classes are somewhere between SP and GB classes. All the distributions are centered at the representative tags. Tag frequency has been normalized such that the total frequency adds up to 1 in each plot. The gene IDs to which the promoters are assigned are placed on top of each plot.
Core-promoter elements in S. pombe
Precise identification of representative tags also allowed reliable analysis of the characteristics of core promoters. Here we retrieved sequences of ± 50 nt flanking the representative tags of all TCs (7,859) located in the annotated 5′ UTR (including its upstream 100 nt) of genes and examined the sequence features with WebLogo.40 We identified the consensus initiator (Inr) sequence as PyPyPuN(A/C)(C/A), where Pu was the first transcribed nucleotide (i.e., the +1 position with respect to representative tags) and “A” was the most used nucleotide at the +1 position (Fig. 4A). Using TCs only in the annotated 5′ UTR or only in the upstream 100 nt yielded the same consensus sequence as Figure 4A. Using TCs from other regions of the genome yielded significantly different consensus sequences (Fig. 4B and C). Compared with S. cerevisiae (reported Inr consensus:21 NPyA(A/T)NN), S. pombe is closer to the mammalian Inr (reported Inr consensus:41 PyPyAN(T/A)PyPy). For different promoter classes, the −2 to +1 positions of the Inr consensus (i.e., PyPyPu) were unchanged, while the +2 to +4 positions varied slightly (Fig. S4). The percentage of promoters with PyPyPu as the Inr differed significantly between the 4 classes (p-value < 0.001 by proportion test), with the highest for DP class (77.2%) followed by SP (72.7%), MP (69.2%) and GB (56.0%) classes. TTA and TTG were the 2 most used trinucleotides at the −2 to +1 positions of all 7,859 TCs, which was the same for all 4 classes (Table S4). Promoters with the PyPyPu Inr had a significantly higher level of expression than those without (p-value < 0.001 by Wilcoxon rank sum test), with TTG corresponding to the highest expression level on average. In addition, these promoters also had a greater TC width (p-value < 0.05 by Wilcoxon rank sum test).
Figure 4.

The consensus sequence of DeepCAGE-derived core promoters. The sequences of ± 50 nt flanking representative tags was retrieved from TCs situated in (A) annotated 5′ UTR (including its upstream 100 nt) of genes, (B) CDS and 3′ UTR of genes, and (C) anywhere randomly picked from the genome. The retrieved sequences were analyzed with WebLogo.40 The x-axis shows the relative positions with respect to the representative tags (+1).
In S. pombe, the TATA box is highly enriched at 25 to 32 nt upstream of the 7,859 representative tags of core promoters. Within 50 nt upstream of the representative tags, ˜62% of the canonical TATA box (including TATATA and TATAAA 5) were located at 25 to 32 nt upstream of the representative tags, with position −28 as the (most) preferred site (Fig. 5A). Other degenerate TATA box was enriched at almost the same location with the similar preferred site (e.g., TATANN was enriched at 25 to 31 nt upstream of representative tags, with -28 as the preferred site). In this region, the TATA-box with the highest frequency was TATATA, followed by TATAAA, indicating that preference for the canonical TATA box is conserved among species.5 This TATA-enriched region is similar to that of the human,41,42 but differed significantly from S. cerevisiae, whose TATA box is located at 40–120 nt upstream of the annotated TSSs.21 Overall, only ˜8% of the 7,879 core promoters were associated with the canonical TATA box in this region, which shared the same consensus Inr sequence (Fig. 5B) but had significantly higher expression levels compared with the rest of the core promoters (p-value < 0.001 by Wilcoxon rank sum test). Outside this region, the canonical TATA box was nearly uniformly distributed between position −50 to −250, and a lower TATA box frequency was observed flanking the TATA-enriched region (Fig. 5A). ˜11% and ˜20% of the 7,859 core promoters had at least one canonical TATA box in 50 and 100 nt upstream of their representative tags, respectively (the percentage became ˜31% and ˜55% when using TATA as the TATA box). Since the TATA box could have many more variants than what we analyzed here,43 the percentage of TATA-associated promoters could be even higher than 55%. In addition, all 4 promoter classes shared a very similar TATA-enriched region with almost the same preferred site of the canonical TATA box (Fig. S5). As expected, the usage of the TATA box in this region differed between SP promoter class and the rest: the canonical TATA box was significantly more prevalent in SP promoters (˜14%) than in any other shape class (<7.5% for all the other 3 classes; all p-values <0.01 by proportion test), while no significant difference was found between DP, MP and GB classes (p-value > 0.05 by proportion test).
Figure 5.

The characteristics of the canonical TATA box and the new motif. (A) Distribution of the canonical TATA box. The dashed line (y = 34) is the maximum frequency of the canonical TATA box between position −50 to position −150 with respect to the representative tags. Only positions with frequency above the dashed line were considered as an enriched region of the canonical TATA box. In this case, position −25 to −37 were the TATA-enriched region for S. pombe genes. (B) The consensus sequence of TATA-containing promoters. All of these promoters had the canonical TATA box in the TATA-enriched region. (C) The new motif found by MEME for the typical 420 SP promoters. The numbers under sequence logos are relative positions with respect to the representative tags in the 67 core promoters.
To search for possible new motifs in the core promoters of S. pombe, we analyzed sequences of ± 50 nt flanking the representative tags of the core promoters using MEME for motif analysis.44 We first performed this analysis for each promoter class but failed to identify any significant motif. Then, by adding more constraints (such as requiring sharper peaks for SP class and smoother distributions for GB class) into the promoter classification criteria, we further selected 420, 250, 117 and 104 typical promoters respectively from SP, DP, MP and GB classes for motif analysis (Supplementary Text). A significant motif CC(T/A)(T/C)(T/C/A)(A/G)CCA(A/T/C) was found in 125 of the 420 typical SP promoters (E-value < 1.4×10−48 by MEME; Fig. 5C). This motif was located at 2 nt downstream of the representative tags in 67 of the 125 promoters, and genes with the 67 promoters were most enriched in the biological process of “signal transduction” in GO. Although the detailed regulatory role of this motif is still unclear, the promoters with this motif showed significantly higher expression levels than the rest of the 420 typical promoters (p-value < 0.05 by Wilcoxon rank sum test) and the rest of all SP promoters (p-value < 0.001 by Wilcoxon rank sum test). No significant motif was found for the typical DP, MP and GB promoters.
Widespread alternative promoters in S. pombe
In mammalian cells, recent studies showed that a large proportion of protein-coding genes possess alternative promoters. 6,45 To determine whether alternative promoters are also widely used in S. pombe, we examined the number of core promoters for each of the 4,684 genes with annotated 5' UTR: 4,041 genes had at least 1 core promoter and 1,773 genes (37.9% of 4,684) had at least 2 core promoters (Fig. 6A). (In this study, all alternative promoters were located upstream of the CDS.) Two typical examples were shown here: a gene (SPAC20G4.08) with only 1 core promoter (Fig. 6B) and another (SPAC23D3.04c) with 5 core promoters (Fig. 6C). Functional enrichment analysis on 259 genes with at least 5 alternative promoters showed that these genes were primarily associated with membrane transport activities (Table S5). Whether S. pombe uses this complex transcription initiation to cope with environmental changes where membrane transport plays a critical role remains unknown and perhaps, multiple promoters may allow regulation of the same gene from different pathways without sharing any component. These findings show that, even for the unicellular S. pombe, alternative promoters are already widespread, which results in diversified transcript isoforms and increased complexity of the transcriptome.
Figure 6.

Alternative promoter usage revealed by DeepCAGE. (A) Distribution of the number of core promoters identified for each gene. (B) A representative gene (SPAC20G4.08) for those using only one core promoter in the annotated 5′ UTR and its upstream 100 nt. (C) A representative gene (SPAC23D3.04c) for those using alternative promoters in the annotated 5′ UTR and its upstream 100 nt. In (B) and (C), the solid red box stands for the coding sequences and the arrows show the direction of transcription. (D) The consensus sequence of the dominant promoters. (E) The consensus sequence of the minor promoters. (F) The consensus sequence of the top 300 dominant promoters. Method of sequence analysis and notations in (D), (E) and (F) are the same as in Figure 4.
For many genes with alternative promoters, the expression level varied considerably between their primary and minor promoters. Similar to the definition of DP class, we defined the primary promoter as “dominant promoter” if the tag number of the primary promoter was twice the tag number of any other promoter for the same gene. We found that dominant promoters existed in 1,516 genes of S. pombe. For each of the 1,516 genes, the core promoter with the smallest number of tags was then defined as “minor promoter.” Here we obtained the sequences of ± 50 nt flanking representative tags of the dominant promoters and the minor promoters, submitted them to WebLogo for sequence analysis, and compared the consensus sequence between the 2 groups of core promoters (Fig. 6D, 6E). The comparison showed that, for the dominant promoters, the −2 to +1 position had a stronger PyPyPu pattern, and the +3 to +11 position (overlapped with the position of the motif found by MEME) next to Inr was overrepresented with cytosine, which became more prominent for the top 300 dominant promoters (ranked by the number of tags in each core promoter; Fig. 6F).
Characterization of the core promoters of ncRNAs
Our DeepCAGE dataset also contained a large number of TCs that were associated with the ncRNA genes annotated in PomBase (genome assembly: ASM294v2; TCs overlapped with annotated protein-coding genes were excluded from this analysis in the first place). We found that the gene bodies (including their upstream 100 nt) of 629 ncRNAs were covered by at least one TC. In addition, we found 393 TCs located within ±100 nt flanking the annotated TSSs of 300 ncRNAs (in the 629 ncRNAs) and observed a nucleosome-depleted region around the representative tags (Fig. S2C) and a H3K4me3-enriched region immediately downstream of the representative tags (Fig. S3C). We assigned the 393 TCs as the putative core promoters for the 300 ncRNAs and then annotated their representative TSSs with the criteria described above (Table S6). Further analysis on the 393 promoters showed that, even for these ncRNAs, their core promoters could also be grouped into 4 classes by the same criteria for protein-coding genes. However, none of the 4 classes was significantly enriched for ncRNAs compared with protein-coding genes (p-value > 0.05 by Fisher's exact test). With the sequences of ± 50 nt flanking representative tags of the 393 core promoters, WebLogo analysis showed that (i) the Inr consensus sequence of ncRNA was very similar to protein-coding genes, and (ii) the canonical TATA box used by the 393 core promoters was also enriched at 25 to 32 nt upstream of the representative tags, which was the same as protein-coding genes (Fig. 7A). We further noted that, among the 300 ncRNAs, 177 were antisense and the consensus sequence of their core promoters was similar to that of the other 123 ncRNAs (Fig. 7B). These similarities suggest that these 5' capped ncRNAs may share similar transcriptional regulatory mechanisms with well-annotated protein-coding genes.
Figure 7.

The consensus sequence of ncRNA core promoters. (A) The consensus sequence of the 393 core promoters of the 300 ncRNAs. (B) The consensus sequence of the 239 core promoters of the 177 antisense ncRNAs. Method of sequence analysis and notations in figures are the same as in Figure 4.
Potential novel core promoters in the intergenic regions
Among the tags located in the intergenic regions (Fig. 1A), 413,631 tags (2.6% of all tags) were not associated with any annotated ncRNA gene. These tags formed 3,522 TCs in total (consisting of 374,182 tags), of which 1,919 TCs (consisting of 41,392 tags) had representative tags located in the antisense strand of annotated protein-coding genes (16.6% in annotated 5′ UTR, 15.7% in 3′ UTR, 2.8% in intron and 64.2% in CDS). 164 of the 3,522 TCs had ≥ 100 tags each and a total of 315,291 tags (76.2% of the 413,631 tags; Table S7). The distribution of the 164 TCs in 4 shape classes was almost the same as that of the core promoters (Pearson correlation > 0.99). The consensus sequence of the 164 TCs (± 50 nt flanking representative tags) was also similar to that of the core promoters in the annotated 5′ UTR of genes (Fig. S6). In addition, a nucleosome-depleted region was observed around the representative tags of the 164 TCs (Fig. S2D) and an H3K4me3-enriched region was also observed immediately downstream of the representative tags (Fig. S3D). These results suggested that the 164 TCs could be novel core promoters in the intergenic regions. Further comparison with the 3′ end data from Mata46 showed that 58 TCs were followed by downstream polyadenylation sites in the same intergenic regions, indicating the existence of unannotated genes in those regions (Supplementary Text).
Discussion
In this study, we have combined the CAGE technique with next-generation sequencing to perform TSS profiling on a genome-wide scale for S. pombe. For this important model system, detailed and in depth analysis of the promoter structures based on the TSS landscape should provide a much needed basis for the delineation of the intricate relationship between the promoter structure and gene regulation, first discovered in the much complex organisms.5,6,15 The much deeper depth and the unprecedented resolution of CAGE had allowed us to precisely define the representative TSS for 4,261 protein-coding genes present in PomBase (82.8%), including 220 without any 5′ UTR information in the literature. Compared with the annotated TSS, only the representative-TSS-based sequence analysis allows the identification of the consensus Inr sequence and the TATA-enriched region (randomly shifting representative TSS by 1 to 10 nt generates a consensus sequence similar to that flanking annotated TSS; see Fig. S7), which is another evidence that DeepCAGE has improved the 5′ UTR annotation in S. pombe. A significant finding in this regard, in comparison with the currently annotated 5′ UTRs in PomBase, is that most representative-TSS-defined 5′ UTRs (81.6%) of the protein-coding genes in S. pombe are shorter than previously annotated (Table S1). In comparison with that of S. cerevisiae (83 nt),47 the 5′ UTR in S. pombe is more than twice as long (194 nt). But it is interesting that this length is much closer to that in the human (210 nt).14 Since the 5′ UTR contains multiple regulatory regions for translational efficiency,14 this “human-like” 5′ UTR suggests that S. pombe may already have a comparable complexity in the regulatory roles for the 5′ UTR.
One of the most intriguing findings is that the promoters in S. pombe can be precisely classified into 4 classes as in mammalian cells, albeit with a somewhat different relative abundance. For the latter, sharp promoters (i.e., SP class) in S. pombe are much more prevalent than in the humans, and the difference of TATA-box usage between sharp and broad promoters is much smaller in S. pombe than in the humans. Although the functions associated with sharp and broad promoters in mammals are thought to accommodate complex multicellular tasks,5 such a shape distribution was also evolved for the unicellular S. pombe where the multicellular tasks are absent (Table S3). Furthermore, it has been observed that a large number of genes used alternative promoters and some of them used core promoters from different classes. The existence of alternative promoters enhances the complexity of transcriptional and post-transcriptional regulation. In mammalian cells, alternative promoters would be used to achieve cell type specific transcription programs.48 Whereas in unicellular organisms, one might speculate that such a complexity may be required for the cell to respond to different environmental or nutritional conditions, a role already suggested for the transcripts with alternative promoters in the budding yeast.49
Consistent with other eukaryotes,6,15,17 a considerable fraction of the identified TCs was mapped within the CDS of S. pombe genes (i.e., CDS-TC; Fig. 1E). Their average expression level (measured by their tag numbers) was much lower than that in annotated 5′ UTR (p-value <0.001 by Wilcoxon rank sum test), and their consensus sequence (± 50 nt flanking representative tags) seemed very different from that in annotated 5′ UTR (Fig. 8A). However, for those highly expressed (top 10%) CDS-TCs, we found that they had a consensus sequence very similar to that in annotated 5′ UTR (Fig. 8B). Moreover, the number of TCs classified into each class was highly correlated with its counterpart from annotated 5′ UTR (correlation coefficient: 0.97). Therefore, the CDS-TCs with low expression levels may be generated by post-transcriptional processing, like that found in Drosophila,15 but it is tempting to suggest that the highly expressed CDS-TCs were bona fide transcripts with their own promoters, as recently reported in the budding yeast.35 Whether these transcripts code for protein isoforms or unannotated ncRNAs remain to be determined.
Figure 8.

The consensus sequence of CDS-TCs. (A) The consensus sequence of TCs situated in CDS of S. pombe genes. (B) The consensus sequence of the highly expressed (top 10%) CDS-TCs. Method of sequence analysis and notations in figures are the same as in Figure 4.
Further analysis of the 3,881 TCs (with at least 100 tags each) uncovered 2 interesting but underappreciated properties of the TSS distribution. First, the immediate vicinity of the representative tags displayed diverse patterns of TSS distributions, which existed in each shape class. For example, in 1,485 TCs (38.3%), the tag numbers at the representative tag positions accounted for >95% of all tags in ±2 nt regions flanking the representative tags (i.e., in 5-nt windows), resulting in local-ultra-sharp peaks (LUSP); while in 1,035 TCs (26.7%), this percentage was below 75%, yielding local-distributed peaks (LDP). Interestingly, the LUSP used significantly more PyPyPu as the Inr than the LDP did (82.1% vs 43.8%, p-value <0.001 by proportion test; this result held for all 4 classes). The most used trinucleotide at the position of PyPyPu also differed: the former preferred TTA, and the latter preferred CCA. Second, a considerable fraction of these TCs contained sharply defined discrete peaks, even at very high tag counts (Fig. S8). Therefore, such discrete distribution was not owing to the depth of the sequencing, but an intrinsic property of transcription initiation for S. pombe genes. Although the governing principle that could so drastically change the initiation efficiency within only a few nucleotides is not yet understood, one could speculate that the initiation site must have precise and complex structures to enable this accurate discrimination. In fact, a closer look at the human promoters (based on the data from ref. 31) seems to suggest that such a phenomenon may also exist. These are 2 clear examples of the power of high depth sequencing in uncovering intricate details in the TSS landscape.
It is somewhat a surprise that the fraction of annotated ncRNAs with 5' cap have certain properties very similar to protein-coding genes, including the consensus Inr sequence, the TATA-enriched region and the promoter classes. These similarities strongly suggest that at least for these ncRNAs, regardless of their function, may be regulated by the same mechanism as the protein-coding genes. Whether such a property is also present in mammalian cells should certainly be an interest of future examination.
Conclusion
With the first comprehensive analysis of the TSS using DeepCAGE, we have found that the unicellular organism, S. pombe, share many fundamental properties in the structure of their promoters with that in higher eukaryotes, including human. The most intriguing is the similar usage of alternative promoters and the same diverse promoter classes, together with a similar universal Inr sequence and an optional upstream TATA box at a similar position. These attributes further demonstrate this simpler organism as a powerful model for understanding the complex mechanisms of eukaryotic transcriptional regulation. These findings not only provided a useful basis for the study of gene expression regulation in this widely used organism, but may also be invaluable for the design of experiments aimed at delineating the transcriptional processes of higher eukaryotes in this much simpler system.
Materials and Methods
Yeast strain and growth conditions
We used a wild-type Schizosaccharomyces pombe 972 h- strain for all experiments. The strain was exponentially grown at 32°C in rich medium supplemented with glucose, adenine, histidine, lecucine and uracil. Cells were harvested and resuspended with pre-cooled lysis buffer (10mM EDTA, 10 mM Tris-HCl pH 8.0 and 50 mM NaOAc), which were further treated with mechanical pulverization. The cell material was stored at −80°C.
Nucleic acid isolation and DeepGAGE library preparation
Total RNA was extracted with the TRIZOL reagent according to the manufacturer's instructions (Invitrogen, Carlsbad, CA) and isolated RNA was quantified by bioanalyzer 2100. The CAGE library was prepared as the method described by Valen et al.50 with some modifications to work with Illumina GA II sequencer. Briefly, cDNA synthesis was conducted by using 50ug total RNA and Superscript II reverse transcriptase (Life Technologies). The cap selected cDNA/RNA hybrids were treated with RNase ONE Ribonuclease (promega) and purified cDNA was ligated with 5′ end linker containing Mme I site. After 2nd strand synthesis, dsDNA fragments were digested with Mme I followed by the ligation with 3′ end linker. The resulting CAGE tags were amplified by 20 cycles of PCR, which was subject to deep sequencing.
Quality assessment of CAGE library
The rRNA constitutes ˜8.5% of our cap-selected CAGE library, which is estimated by the percentage of tags mapped to rRNA genes. Assuming that >90% of the total RNA is rRNA, the fold change of rRNA after cap selection is about 97 in our case. This means at most 1.0% (1/97) of uncapped rRNA escaped from the cap-trapping procedure; this also means at most 1.0% of CAGE tags mapped to non-rRNA genes are derived from uncapped transcripts (i.e., the FDR of each CAGE tag is 1%).
Calculation of tag distribution
Different annotated features could overlap with each other and a tag might hit 2 or more features at the same time. When this happened, we assigned tags with the following priority: annotated 5′ UTR > 3′ UTR > CDS > intron. The priority was based on the order of tag density observed in each feature (i.e., annotated 5′ UTR >> 3′ UTR > CDS > intron; tags hitting more than one feature were excluded from the density calculation). In addition, in Figure 1A, the upstream 1,000 nt (i.e., the 95th percentile of the lengths of all annotated 5′ UTRs) of the ORFs without annotated 5′ UTR were included in “5′ UTR,” and the downstream 1,500 nt (i.e., the 95th percentile of all 3′ UTR lengths) of the ORFs without annotated 3′ UTR were included in “3′ UTR,” and the 100 nt upstream of annotated 5′ UTR were also included in “5′ UTR.” The rest of the genomic regions were considered as the intergenic regions. The TC distribution (Fig. 1D) was calculated in the same way.
Tag cluster identification and quality assessment
We used the following steps to find tag clusters where tags (or putative TSSs) were significantly enriched. First, we grouped tags that overlapped on the same strand together. Second, we counted the tag numbers in sequential 10-nt windows within each tag group and considered the tag group to be a TC if there were at least 5 tags in any window. We used both computational and statistical methods to determine the reliability of TCs identified by the above steps. First, assuming that tags outside genes and their flanking 500-nt regions were background noises, we randomly generated tags on the whole genome under the same noise level and counted the total number of TCs with the same criterion, which was less than 0.1% of TCs observed in our data. Therefore, based on the above assumption, <0.1% of all identified TCs were generated by the background noises. Second, in a TC with n tags, the number (X) of tags not derived from the 5′ cap follows a binomial distribution with p = 0.01 (i.e., the FDR of each CAGE tag, as shown above). For any given n (n ≥ 5), the probability that X is greater than 2 (or 5% of the tag number n, whichever larger) is less than 1.4%. For TC with n ≥ 100 tags, this probability falls below 0.1% (Fig. S9). Therefore, most tags in each identified TC should be derived from the 5′ cap of transcripts, which means each identified TC should represent a putative core promoter.
Classification criteria of promoter classes
We classified core promoters (with ≥100 CAGE tags each) into 4 shape classes using the following criteria: a core promoter was classified into SP class if the distance between the 25th and 75th percentile of its tag positions was ≤3 nt, or the distance between the 15th and 85th percentile was ≤5 nt. If the ratio between the highest peak and the second highest peak was >2, and the highest peak accounted for >20% of all tags within the core promoter, and the core promoter was not classified into SP class, it was classified into DP class. If distance between any 2 consecutive peaks (both must account for >15% of all tags) was ≥6 nt and the core promoter was neither in SP nor in DP class, it was classified into MP class. If none of the above applied, the core promoter was classified into GB class. (TCs could be classified into 4 shape classes with the same criteria). Our criteria share certain similarities with the one proposed by Carninci et al.6 but yield better classification results for S. pombe. The stringent criteria for selecting the typical promoters are stated in Supplementary Text.
Nucleosome positioning data and histone modification data
We obtained the nucleosome center positioning data for S. pombe from ref. 36 and calculated the nucleosome occupancy defined in ref. 35 as the total nucleosome center positioning score in the ± 50 nt of every genomic location. The nucleosome occupancy around representative tags (Fig. S2) was first averaged and then normalized by the genome average nucleosome occupancy. We obtained the histone modification data (i.e., the raw ChIP-Seq data of H3K4me3) for S. pombe from ref. 38, used bowtie to perform sequence alignment and used MACS51 to analyze the ChIP enrichment of H3K4me3 with default settings. The H3K4me3 enrichment around representative tags (Fig. S3) was first averaged and then normalized by the median enrichment value of the whole genome.
Bioinformatics tools
Sequence alignment was performed with Bowtie.30 Downstream data processing and statistical analyses were based on R (http://www.r-project.org). Sequence analyses were performed using WebLogo40 (http://weblogo.berkeley.edu), R and MEME 44 (http://meme.nbcr.net/meme/cgi-bin/meme.cgi; motif search was only based on the submitted sequences and no background Markov model was specified by us). Functional enrichment was analyzed using GOEAST with all S. pombe genes as the background (http://omicslab.genetics.ac.cn/GOEAST/index.php).34
Accession number
The sequencing data of this study are available in the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-3188.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
We thank Ningbo Zhang and Hualei Kong for their help in data analysis.
Funding
This study was supported by Longhua Medical Project of State Clinical Research Center of TCM in Longhua Hospital (LYTD-21 and JDZX2012123), State Key Development Program for Basic Research of China (2013CB967402, 2010CB529205), the Scientific Research Foundation for the Returned Overseas Chinese Scholars (State Education Ministry, China) and National Natural Science Foundation of China (11374207, 91229108 and 91019004). ZS is also supported by K.C. Wong Education Foundation (H.K.).
Authors' Contributions
ZS and XZ conceived the study and designed the experiments. JH carried out the experiments. HL carried out the data analysis. LB and YK participated in the experiments. CH and PT assisted in the data analysis. ZS, XZ and HL wrote the manuscript. All authors read and approved the final manuscript.
Supplemental Material
Supplemental data for this article can be accessed on the publisher's website.
References
- 1.Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, Hume DA. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet 2007; 8:424-36; PMID:17486122; http://dx.doi.org/ 10.1038/nrg2026 [DOI] [PubMed] [Google Scholar]
- 2.Maston GA, Evans Sk, Green MR. Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 2006; 7:29-59; PMID:16719718; http://dx.doi.org/ 10.1146/annurev.genom.7.080505.115623 [DOI] [PubMed] [Google Scholar]
- 3.Deeaton AM, Brid A. CpG islands and the regulation of transcription. Genes Dev 2011; 25:1010-22; PMID:21576262; http://dx.doi.org/ 10.1101/gad.2037511 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Kadonaga JT. Perspectives on the RNA polymerase II ccore promoter. WIREs Dev Biol 2012; 1:40-51; PMID:23801666; http://dx.doi.org/ 10.1002/wdev.21 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lenhard B, Sandelin A, Carninci P. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat Rev Genet 2012; 13:233-45; PMID:22392219 [DOI] [PubMed] [Google Scholar]
- 6.Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CAM, Taylor MS, Engstrom PG, Frith MC, et al.. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 2006; 38:626-35; PMID:16645617; http://dx.doi.org/ 10.1038/ng1789 [DOI] [PubMed] [Google Scholar]
- 7.Akalin A, Fredman D, Arner E, Dong X, Bryne JC, Suzuki H, Daub CO, Hayashizaki Y, Lenhard B. Transcriptional features of genomic regulatory blocks. Genome Biol 2009; 10:R38; PMID: 19374772; http://dx.doi.org/ 10.1186/gb-2009-10-4-r38 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Rach EA, Winter DR, Benjamin AM, Corcoran DL, Ni T, Zhu J, Ohler U. Transcription initiation patterns indicate divergent strategies for gene regulation at the chromatin level. PLoS Genet 2011; 7:e1001274; PMID: 21249180; http://dx.doi.org/ 10.1371/journal.pgen.1001274 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yamashita R, Suzuki Y, sugano S, Nakai K. Genome-wide analysis reveals strong correlation between CpG islands with nearby transcription start sites of genes and their tissue specificity. Gene 2005; 350:129-36; PMID:15784181; http://dx.doi.org/ 10.1016/j.gene.2005.01.012 [DOI] [PubMed] [Google Scholar]
- 10.Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M, et al.. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 2011; 473:43-9; PMID:21441907; http://dx.doi.org/ 10.1038/nature09906 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kharchenko PV, Alekseyenko AA, Schwartz YB, Minoda A, Riddle NC, Ernst J, Sabo PJ, Larschan E, Gorchakov AA, Gu T, et al.. Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature 2011; 471:480-5; PMID:21179089; http://dx.doi.org/ 10.1038/nature09725 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ozsolak F, Song JS, Liu XS, Fisher DE. High-throughput mapping of the chromatin structure of human promoters. Nat biotechnol 2007; 25:244-8; PMID:17220878; http://dx.doi.org/ 10.1038/nbt1279 [DOI] [PubMed] [Google Scholar]
- 13.Laurent L, Wong E, Li G, Huynh T, Tsirigos A, Ong CT, Low HM, Kin Sung KW, Rigoutsos I, Loring J, Wei CL. Dynamic changes in the human methylome during differentiation. Genome Res 2010; 20:320-31; PMID:20133333; http://dx.doi.org/ 10.1101/gr.101907.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chatterjee S, Pal JK. Role of 5′- and 3′-untranslated regions of mRNAs in human diseases. Biol Cell 2009; 101:251-62; PMID:19275763; http://dx.doi.org/ 10.1042/BC20080104 [DOI] [PubMed] [Google Scholar]
- 15.Hoskins RA, Landolin JM, Brown JB, Sandler JE, Takahashi H, Lassmann T, Yu C, Booth BW, Zhang D, Wan KH, et al.. Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res 2011; 21:182-92; PMID:21177961; http://dx.doi.org/ 10.1101/gr.112466.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, Kodzius R, Watahiki A, Nakamura M, Arakawa T, et al.. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 2003; 100:15776-81; PMID:14663149; http://dx.doi.org/ 10.1073/pnas.2136655100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ni T, Corcoran DL, Rach EA, Song S, Spana EP, Gao Y, Ohler U, Zhu J. A paired-end sequencing strategy to map the complex landscape of transcription initiation. Nat Methods 2010; 7:521-7; PMID:20495556; http://dx.doi.org/ 10.1038/nmeth.1464 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wood V, Gwilliam R, Rajandream MA, Lyne M, Lyne R, Stewart A, Sgouros J, Peat N, Hayles J, Baker S, et al.. The genome sequence of Schizosaccharomyces pombe. Nature 2002; 415:871-80; PMID:11859360; http://dx.doi.org/ 10.1038/nature724 [DOI] [PubMed] [Google Scholar]
- 19.Hedges SB. The origin and evolution of model organisms. Nat Rev Genet 2002; 3:838-49; PMID:12415314; http://dx.doi.org/ 10.1038/nrg929 [DOI] [PubMed] [Google Scholar]
- 20.Miura F, Kawaguchi N, Sese J, Toyoda A, Hattori M, Morishita S, Ito T. A large-scale full-length cDNA analysis to explore the budding yeast transcriptome. Proc Natl Acad Sci U S A 2006; 103:17846-51; PMID:17101987; http://dx.doi.org/ 10.1073/pnas.0605645103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zhihong Z, Fred S. Dietrich. Mapping of transcription start stites in Saccharomyces cerevisiae using 5' SAGE. Nucleic Acids Res 2005; 33:2838-51; PMID:15905473; http://dx.doi.org/ 10.1093/nar/gki583 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Rhind N, Chen Z, Yassour M, Thompson DA, Haas BJ, Habib N, Wapinski I, Roy S, Lin MF, Heiman DI, et al.. Comparative functional genomics of the fission yeasts. Science 2011; 332:930-6; PMID:21511999; http://dx.doi.org/ 10.1126/science.1203357 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Yanagida M. Gene products required for chromosome separation. J Cell Sci Suppl 1989; 12:213-29; PMID:2561424; http://dx.doi.org/ 10.1242/jcs.1989.Supplement_12.18 [DOI] [PubMed] [Google Scholar]
- 24.Yanagida M. The model unicellular eukaryote, Schizosaccharomyces pombe. Genome Biol 2002; 3:comment2003-comment2003.4; PMID: 11897018; http://dx.doi.org/ 10.1186/gb-2002-3-3-comment2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Forsburg SL. The yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe: models for cell biology research. Gravit Space Biol Bull 2005; 18:3-9; PMID:16038088 [PubMed] [Google Scholar]
- 26.Chu Z, Li J, Eshaghi M, Karuturi RK, Lin K, Liu J. Adaptive expression responses in the Pol-gamma null strain of S. pombe depleted of mitochondrial genome. BMC Genomics 2007; 8:323; PMID: 17868468; http://dx.doi.org/ 10.1186/1471-2164-8-323 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Mizuguch T, Fudenberg G, Mehta S, Belton JM, Taneja N, Folco HD, FitzGerald P, Dekker J, Mirny L, Barrowman J, Grewal SI. Cohesin-dependent globules and heterochromatin shape 3D genome architecture in S. pombe. Nature 2014; http://dx.doi.org/ 10.1038/nature13833 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wood V, Harris MA, McDowall MD, Rutherford K, Vaughan BW, Staines DM, Aslett M, Lock A, Bähler J, Kersey PJ, et al.. PomBase: a comprehensive online resource for fission yeast. Nucleic Acids Res 2012; 40:D695-9; PMID:22039153; http://dx.doi.org/ 10.1093/nar/gkr853 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Forsburg SL. The art and design of genetic screens: yeast. Nat Rev Genet 2001; 2:659-68; PMID:11533715; http://dx.doi.org/ 10.1038/35088500 [DOI] [PubMed] [Google Scholar]
- 30.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009; 10:R25; PMID: 19261174; http://dx.doi.org/ 10.1186/gb-2009-10-3-r25 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Suzuki H, Forrest AR, van Nimwegen E, Daub CO, Balwierz PJ, Irvine KM, Lassmann T, Ravasi T, Hasegawa Y, de Hoon MJ, et al.. The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat Genet 2009; 41:553-62; PMID:19377474; http://dx.doi.org/ 10.1038/ng.375 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Lantermann AB, Straub T, Stralfors A, Yuan GC, Ekwall K, Korber P. Schizosaccharomyces pombe genome-wide nucleosome mapping reveals positioning mechanisms distinct from those of Saccharomyces cerevisiae. Nat Struct Mol Biol 2010; 17:251-7; PMID:20118936; http://dx.doi.org/ 10.1038/nsmb.1741 [DOI] [PubMed] [Google Scholar]
- 33.Dutrow N, Nix DA, Holt D, Milash B, Dalley B, Westbroek E, Parnell TJ, Cairns BR. Dynamic transcriptome of Schizosaccharomyces pombe shown by RNA-DNA hybrid mapping. Nat Genet 2008; 40:977-86; PMID:18641648; http://dx.doi.org/ 10.1038/ng.196 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zheng Q, Wang XJ. GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res 2008; 36:W358-63; PMID:18487275; http://dx.doi.org/ 10.1093/nar/gkn276 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Arribere JA, Gilbert WV. Roles for transcript leaders in translation and mRNA decay revealed by transcript leader sequencing. Genome Res 2013; 23:977-87; PMID:23580730; http://dx.doi.org/ 10.1101/gr.150342.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Moyle-Heyrman G, Zaichuk T, Xi L, Zhang Q, Uhlenbeck OC, Holmgren R, Widom J, Wang JP. Chemical map of Schizosaccharomyces pombe reveals species-specific features in nucleosome positioning. Proc Natl Acad Sci U S A 2013; 110:20158-63; PMID:24277842; http://dx.doi.org/ 10.1073/pnas.1315809110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Guenther MG, Levine SS, Boyer LA, Jaenisch R, Young RA. A chromatin landmark and transcription initiation at most promoters in human cells. Cell 2007; 130:77-88; PMID:17632057; http://dx.doi.org/ 10.1016/j.cell.2007.05.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.DeGennaro CM, Alver BH, Marguerat S, Stepanova E, Davis CP, Bahler J, Park PJ, Winston F. Spt6 regulates intragenic and antisense transcription, nucleosome positioning, and histone modifications genome-wide in fission yeast. Mol Cell Biol 2013; 33:4779-92; PMID:24100010; http://dx.doi.org/ 10.1128/MCB.01068-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Cheutin T, Gorski SA, May KM, Singh PB, Misteli T. In vivo dynamics of Swi6 in yeast: evidence for a stochastic model of heterochromatin. Mol Cell Biol 2004; 24:3157-67; PMID:15060140; http://dx.doi.org/ 10.1128/MCB.24.8.3157-3167.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res 2004; 14:1188-90; PMID:15173120; http://dx.doi.org/ 10.1101/gr.849004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Smale ST, Kadonaga JT. The RNA polymerase II core promoter. Annu Rev Biochem 2003; 72:449-79; PMID:12651739; http://dx.doi.org/ 10.1146/annurev.biochem.72.121801.161520 [DOI] [PubMed] [Google Scholar]
- 42.Buratowski S, Hahn S, Guarente L, Sharp PA. Five intermediate complexes in transcription initiation by RNA polymerase II. Cell 1989; 56:549-61. 15; PMID: 2917366; http://dx.doi.org/ 10.1016/0092-8674(89)90578-3 [DOI] [PubMed] [Google Scholar]
- 43.Patikoglou GA, Kim JL, Sun L, Yang SH, Kodadek T, Burley SK. TATA element recognition by the TATA box-binding protein has been conserved throughout evolution. Genes Dev 1999; 13:3217-30; PMID:10617571; http://dx.doi.org/ 10.1101/gad.13.24.3217 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Bailey TL, Bodén M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 2009; 37:W202-8; PMID:19458158; http://dx.doi.org/ 10.1093/nar/gkp335 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Landry JR, Mager DL, Wilhelm BT. Complex controls: the role of alternative promoters in mammalian genomes. Trends Genet 2003; 19:640-8; PMID:14585616; http://dx.doi.org/ 10.1016/j.tig.2003.09.014 [DOI] [PubMed] [Google Scholar]
- 46.Mata J. Genome-wide mapping of polyadenylation sites in fission yeast reveals widespread alternative polyadenylation. RNA Biol 2013; 10:1407-14; PMID:23900342; http://dx.doi.org/ 10.4161/rna.25758 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Mazumder B, Seshadri V, Fox PL. Translational control by the 3'-UTR: the ends specify the means. Trends Biochem Sci 2003; 28:91-8; PMID:12575997; http://dx.doi.org/ 10.1016/S0968-0004(03)00002-1 [DOI] [PubMed] [Google Scholar]
- 48.FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassman T, Kulakovskiy IV, Lizio M, et al.. A promoter-level mammalian expression atlas. Nature 2014; 507:462-70; PMID:24670764; http://dx.doi.org/ 10.1038/nature13182 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Pelechano V, Wei W, Steinmetz LM. Extensive transcriptional heterogeneity revealed by isoform profiling. Nature 2013; 497:127-31; PMID:23615609; http://dx.doi.org/ 10.1038/nature12121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Valen E, Pascarella G, Chalk A, Maeda N, Kojima M, Kawazu C, Murata M, Nishiyori H, Lazarevic D, Motti D, et al.. Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. Genome Res 2009; 19:255-65; PMID:19074369; http://dx.doi.org/ 10.1101/gr.084541.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, et al.. Modelo-based analysis of ChIP-Seq (MACS). Genome Biol 2008; 9:R137; PMID: 18798982; http://dx.doi.org/ 10.1186/gb-2008-9-9-r137 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
