Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2015 Mar 3;112(11):3445–3450. doi: 10.1073/pnas.1502849112

Systematic discovery of regulated and conserved alternative exons in the mammalian brain reveals NMD modulating chromatin regulators

Qinghong Yan a,1, Sebastien M Weyn-Vanhentenryck a,1, Jie Wu a, Steven A Sloan b, Ye Zhang b, Kenian Chen c,d, Jia Qian Wu c,d, Ben A Barres b,2, Chaolin Zhang a,2
PMCID: PMC4371929  PMID: 25737549

Significance

Alternative splicing (AS) plays an important role in the mammalian brain, but our atlas of AS events is incomplete. Here, we conducted comprehensive analysis of deep RNA-Seq data of mouse cortex to identify new AS events and evaluate their functionality. We expanded the number of annotated AS events more than 10-fold and demonstrated that, like many known events, thousands of newly discovered events are regulated, conserved, and likely functional. In particular, some can regulate gene expression levels through nonsense-mediated decay, a known mechanism for RNA binding protein autoregulation. Surprisingly, we discovered a number of chromatin regulators as novel targets of this mechanism, revealing a new regulatory link between epigenetics and AS that primarily emerged in the mammalian lineage.

Keywords: new alternative exon, brain transcriptome, RNA-Seq, nonsense-mediated decay, chromatin regulator

Abstract

Alternative splicing (AS) dramatically expands the complexity of the mammalian brain transcriptome, but its atlas remains incomplete. Here we performed deep mRNA sequencing of mouse cortex to discover and characterize alternative exons with potential functional significance. Our analysis expands the list of AS events over 10-fold compared with previous annotations, demonstrating that 72% of multiexon genes express multiple splice variants in this single tissue. To evaluate functionality of the newly discovered AS events, we conducted comprehensive analyses on central nervous system (CNS) cell type-specific splicing, targets of tissue- or cell type-specific RNA binding proteins (RBPs), evolutionary selection pressure, and coupling of AS with nonsense-mediated decay (AS-NMD). We show that newly discovered events account for 23–42% of all cassette exons under tissue- or cell type-specific regulation. Furthermore, over 7,000 cassette exons are under evolutionary selection for regulated AS in mammals, 70% of which are new. Among these are 3,058 highly conserved cassette exons, including 1,014 NMD exons that may function directly to control gene expression levels. These NMD exons are particularly enriched in RBPs including splicing factors and interestingly also regulators for other steps of RNA metabolism. Unexpectedly, a second group of NMD exons reside in genes encoding chromatin regulators. Although the conservation of NMD exons in RBPs frequently extends into lower vertebrates, NMD exons in chromatin regulators are introduced later into the mammalian lineage, implying the emergence of a novel mechanism coupling AS and epigenetics. Our results highlight previously uncharacterized complexity and evolution in the mammalian brain transcriptome.


Molecular diversity derived from alternative splicing (AS) is believed to be critical for the creation of different cell types and tissues with distinct physiological properties and functions (1). This is particularly relevant to the central nervous system (CNS), which requires a large protein repertoire to generate its intricate and complex neural circuits (2). Therefore, a comprehensive catalog of AS events and identification of those with potential functional significance are important steps toward understanding the complexity of the nervous system.

Over the past two decades, discovery and characterization of AS events using different technologies have provided important insights into the evolution and regulation of AS (3, 4). Earlier expressed sequence tag (EST)-based studies revealed the prevalence of AS in mammals (5). Investigation of these AS events, especially comparison of AS patterns in different species, led to an important observation that AS is rapidly evolving in mammals, with many alternative exons created after the split of primates and rodents (6). Evolutionarily recent exons in general have low level of inclusion and frequently result in frame shift and premature termination codons (PTCs) in the transcripts which are presumably eliminated by nonsense mediated decay (NMD) (7). Interestingly, these “evolutionary intermediates” are strikingly different from a subset of potentially functional alternative exons with conserved splicing patterns in different species such as human and mouse, which mostly preserve the reading frame, and have substantially elevated level of conservation in flanking intronic sequences and in the wobble positions of alternative exons (4). More quantitative analysis of the transcriptome using splicing sensitive microarrays (8, 9) and more recently RNA sequencing (RNA-Seq) (10, 11) further demonstrated a large number of alternative exons under tissue-specific regulation. Interestingly, tissue-specific exons are also associated with higher level of conservation in splicing pattern and sequences across different mammalian species (9, 10).

Importantly, RNA-Seq allows for digital profiling of the transcriptome at a much greater depth, coverage and resolution, facilitating efficient discovery of new AS events. Recent RNA-Seq studies of different mammalian tissues, including the brain, have concluded that AS will be detected in >90% of multiexon genes in mammals given sufficient sequencing depth (10, 11). However, this conclusion is based on extrapolation. The sequencing depth of each individual tissue and read length in these earlier RNA-Seq studies are relatively limited, so that a large number of hidden AS events remain to be discovered. In addition, as new exons are discovered, it will be important to know how to separate functional AS events from those representing biological or evolutionary noise.

To address these questions, we performed systematic analysis of the mouse cortex transcriptome at different developmental stages by deep RNA-Seq. The functional significance of previously known and newly discovered AS events was assessed by multiple measures, including protein coding potential, CNS cell type-specific regulation, targets of tissue-specific RNA-binding proteins (RBPs) and signatures of strong purifying selection pressure. Information provided by these multidimensional analyses provided a unique opportunity to reveal the complexity of the mammalian brain transcriptome and its functional impact not fully appreciated thus far, and to prioritize a subset of AS events for further investigation.

Results

Extending the Mouse and Human Cortex Transcriptomes by Deep Sequencing.

To survey the mammalian brain transcriptome, we performed deep RNA-Seq of mouse cortex at nine developmental stages spanning E14.5 to 2 y old, which resulted in 987 million paired-end (PE) 101-nt reads. In combination with a second independent dataset from a recent study (1.88 billion single-end (SE) 101-nt reads; ref. 12), our analysis included 390 billion bases in total that provide an unprecedented depth for discovery of AS in the brain (Table S1; in comparison, the human BodyMap 2.0 dataset has 342 billion bases distributed over 16 different tissues; ref. 13).

All RNA-Seq reads were mapped back to the reference genome and exon junctions using OLego (14). In addition to a provided comprehensive database of annotated exon junctions (referred to as known exon junctions thereafter) derived from existing gene models (RefSeq and UCSC genes), cDNAs and EST sequences, OLego searches for new exon junctions with high accuracy and sensitivity (SI Appendix, Fig. S1 and SI Results). In total, OLego detected 878,526 unique exon junctions, and a majority of them (659,592 or 72%) are newly discovered junctions, as they are not included in 250,993 known junctions currently annotated in the mouse transcriptome (Fig. 1A). Consistent with the depth of sequencing, 92% of RefSeq exon junctions were recovered, suggesting the high sensitivity of junction detection including in genes with low expression.

Fig. 1.

Fig. 1.

Discovery of new exon junctions and AS events by deep sequencing analysis of the mouse cortex transcriptome. (A) Number of known and new exon junctions. (B) Number of known and new cassette exons. Cassette exons with ≥2 supporting reads in each AS junction are considered to be reproducibly alternatively spliced in the brain and shown as a separate group. (C) The Nrxn1 gene structure, RNA-Seq read profile and a sashimi plot showing detected exon junctions (≥2 supporting reads). Exon numbers are labeled below the gene structure. Three alternative promoters are indicated above the read coverage profile. (D) Selected AS events in the Nrxn1 gene. Solid and dotted lines represent known and new AS variants, respectively. The number of reads in the adult cortex is indicated for each exon junction. Variants of low abundance are not shown for clarity. (E) A previously uncharacterized promoter (γ isoform, also indicated in C) produces a truncated form of Nrxn1. RNA read coverage at each developmental stage of the cortex is shown together with CAGE tags supporting the presence of the promoter. Extensive Rbfox binding downstream of the first exon at conserved (U)GCAUG elements is evident from Rbfox CLIP tags.

To detect AS events, we developed a computational pipeline that combines RNA-Seq and cDNA/EST sequences (SI Appendix, Fig. S2 and SI Methods). This analysis detected 602,701 AS events of different types (single and tandem cassette exons, alternative 5′ and 3′ splice sites and mutually exclusive exons). These events were compared with a comprehensive database of 33,795 annotated AS events identified using only cDNA/EST sequences denoted known AS events (SI Appendix, SI Results). Inclusion of the RNA-Seq data allows identification of 568,906 new AS events from at least 13,748 genes, representing a >10-fold increase compared with known events (Fig. 1B and Table S2). Among these, 25,474 known and 421,396 new events have both isoforms reproducibly detected in the cortex (≥2 reads supporting each of the exon junctions), suggesting that 72% of multiexon genes have multiple AS isoforms detectable in this single tissue (SI Appendix, SI Results).

For comparison, we also analyzed the human cortex transcriptome using an RNA-Seq dataset consisting of 977 million PE and 180 million SE 101-nt reads (12) (216 billion bases in total; Table S1). A similar expansion in the number of exon junctions (564,364 new vs. 327,444 known junctions) and AS events (479,217 new vs. 90,543 known events) was observed (Table S2 and SI Appendix, Fig. S3 and SI Results). Therefore, new exon junctions and AS events are exceedingly pervasive in the mammalian brain transcriptome.

To demonstrate how the transcriptome complexity is revealed by our analysis, we focused on the Neurexin gene family, which encodes presynaptic cell-adhesion molecules important for synaptic formation. Three members of the family have been studied in detail and are estimated to generate over 3,000 variants through the use of two alternative promoters (α and β isoforms) and extensive AS in each family member (15) (Fig. 1 CE and SI Appendix, Figs. S4–S6). For example, Nrxn1 was reported to have AS in six different regions of the gene. Our RNA-Seq analysis detected and quantified all these previously described AS events, as well as many additional ones (Fig. 1 C and D), including an alternative exon denoted AS6 that was observed recently (15). Unexpectedly, we observed a third promoter, which we refer to as γ promoter, in a highly conserved region between exons 23 and 24 that would generate a truncated isoform lacking almost all extracellular domains (Fig. 1E). To our knowledge, this isoform has not been previously characterized, but it is supported by ESTs (SI Appendix, Fig. S4) and CAGE (cap-analysis gene expression) tags, which mark the 5′ end of mRNA molecules (16) (Fig. 1E). Interestingly, the use of this promoter increases during brain development (Fig. 1E). The brain- (and muscle-) specific RBP Rbfox binds extensively downstream of the alternative first exon via a cluster of conserved (U)GCAUG elements as evidenced by a large number of cross-linking and immunoprecipitation (CLIP) tags that capture in vivo protein-RNA interactions (17), although the functional significance of this interaction awaits further investigation.

For more detailed characterization of known and newly discovered AS events, we decided to focus on cassette exons, the most prevalent type of AS in mammals. In total, our analysis identified 146,705 new cassette exon events (compared with 16,034 known events; Fig. 1B). Many cassette exons have alternative 5′ or 3′ splice sites or are spliced to different flanking exons, resulting in multiple cassette events overlapping with each other. To avoid potential over-counting, we conservatively defined a nonredundant subset by grouping overlapping cassette exons and selecting a representative for each group (SI Appendix, SI Methods). This filtering resulted in 13,500 known and 64,450 new cassette exons that were used for further analysis. The 64,450 newly discovered nonredundant cassette exons can be further divided into three groups: 36,225 (56.2%) cases representing newly discovered skipping of known exons, 3,490 (5.4%) cassette exons overlapping with known exons with altered exon boundaries, and 24,735 (38.4%) cassette exons without overlapping known exons or gene models in the previous annotations.

Coding Capacity and Evolutionary History of Known and New AS Events.

To prioritize AS events for further characterization, we compared known and newly discovered AS events using several different measures relevant for functional significance. We first developed a pipeline to systematically evaluate their alternative coding capacity (Table S3 and SI Appendix, Figs. S7–S9, SI Results and SI Methods). For both known and new cassette exons that are evaluable, a substantial fraction (46% and 60%, respectively) introduces frameshifts or in-frame stop codons in the alternative exons so that they are expected to trigger NMD upon exon inclusion (NMD_in) or exclusion (NMD_ex) (Fig. 2). When we quantified exon inclusion levels, we found that for a majority of AS events (68% and 85% of known and new events, respectively), the minor isoform has low abundance (<10%) and is frequently targeted by NMD (SI Appendix, Fig. S10 A and B). We also examined the proportion of cassette exons with orthologous sequences including intact splice sites in human, and how this proportion changes with respect to the exon inclusion level. A majority of known and new cassette exons with low inclusion level (<10%) do not have discernible orthologs in human (74% and 85%, respectively; SI Appendix, Fig. S10C), suggesting they most likely arose after the split of mouse and human during evolution. Finally, 35% of known and 17% new cassette exons in mouse have conserved splicing patterns known or newly discovered in human (Table S4). Similar results were obtained in the analysis of the complete set of cassette exons in mouse or using human cassette exons as a reference in comparison with mouse exons (SI Appendix, Figs. S10 DI and S11 and SI Results).

Fig. 2.

Fig. 2.

Classification of known and new cassette exons with respect to alternative coding capacity. Nonredundant cassette exons were included for this analysis. We analyzed known and new exons (all exons and exons with ≥2 reads for each junction in the minor isoform) to determine whether they could cause NMD upon inclusion (NMD_in) or exclusion (NMD_ex), or produce protein products from both isoforms (coding).

Implications from the observations above are threefold. First, based on the alternative coding capacity, abundance of the minor isoform, and presence of orthologous exons or conserved AS patterns between human and mouse, both known and new AS events represent a mixture of those that are potentially functional and those that are tolerated during evolution without affecting the fitness of the organism, confirming previous observations (4). Second, as evidenced by the quantitative differences between known and new AS events in the measures described above, a larger proportion of new AS events than known events might represent evolutionary intermediates or noisy splicing products. Third, given that a much larger number of new events observed in total, the absolute number of new events that are likely functional might be comparable to or even more than the number of known events. Following these arguments, we went on to estimate the number of new AS events that are potentially functional and identify them by taking several complementary approaches.

Cell Type- or Tissue-Specific Regulation of Known and New AS Exons.

Brain-specific exons make up a distinct group that is frequently conserved across mammalian species (18). Despite the fact that the brain is one of the tissues with the most cellular heterogeneity, few studies have addressed cell type-specific splicing in the CNS. We recently performed RNA-Seq to compare all major cell types in the CNS that are acutely isolated from mouse cortex, including neuron, astrocyte, oligodendrocytes at different maturation stages (oligodendrocyte progenitor cells, newly formed and myelinating oligodendrocytes), microglia, and endothelial cells (19). Each of these cell types has distinct molecular signatures at the gene expression and splicing levels, although previous analysis of AS was limited to known events. We therefore extended the analysis to both known and newly discovered cassette exons and identified 3,113 nonredundant cassette exons with cell type-specific splicing, including 1,095 (35%) new events (Fig. 3A). These new AS events are distributed over different cell types, with the largest number in neurons.

Fig. 3.

Fig. 3.

Known and new cassette exons under tissue- or cell type-specific regulation. In each panel, only nonredundant cassette exons were included for analysis. (A) The number of known and new cassette exons with cell type-specific AS. MO, myelinating oligodendrocyte; NFO, newly formed oligodendrocyte; OPC, oligodendrocyte precursor cells. (B) An RNA map correlating neuron-specific exon inclusion (red) or exclusion (blue) with the position of (U)GCAUG elements recognized by Rbfox proteins. (C) The number of known and new cassette exons with altered splicing upon CNS-specific depletion of Ptbp2 (driven by Nestin-Cre, middle bar). For each group, exons are further classified with respect to their alternative coding capacity. (D) An RNA map correlating Ptbp2-dependent exon inclusion (red) or exclusion (blue) with the position of Ptbp2 binding as measured by Ptbp2 CLIP tags.

To obtain insights into the underlying regulatory mechanisms of these known and newly discovered neuron-specific exons, we investigated their regulation by several cell type-specific RBPs, with an initial focus on Rbfox and Nova, whose specific enrichment in neurons compared with the other cell types was confirmed in the RNA-Seq data (SI Appendix, Fig. S12). Rbfox and Nova are known to regulate AS of neuronal transcripts by activating or repressing exon inclusion depending on their binding position relative to the alternative exon (17, 20, 21). We found that both known and new cassette exons with neuron-specific inclusion showed characteristic and strong enrichment of Rbfox binding motifs or CLIP tags (17) in the downstream introns (Fig. 3B and SI Appendix, Fig. S13), which is consistent with activation of these exons in neurons by Rbfox. Similarly, Nova binding sites are enriched downstream of exons with neuron-specific inclusion and upstream of exons with neuron-specific exclusion for both known and new cassette exons (22) (SI Appendix, Fig. S14).

The analysis above highlights the role of RBPs in determining cell type–specific splicing. To further investigate the specific regulatory effects of RBPs more directly, we examined targets of individual tissue- or cell type-specific RBPs for which deep RNA-Seq data are available to show splicing changes upon their depletion in the CNS. This also allows us to identify known and new AS exons that are under such regulation as a measure of functionality. We first analyzed targets of Ptbp2, an RBP that primarily represses exon inclusion in the nervous system (23), using RNA-Seq datasets that compared wild-type (WT) and Ptbp2 knockout (KO) brains (24). Conditional depletion of Ptbp2 in the CNS altered splicing of 653 nonredundant cassette exons, including 249 newly discovered exons (38%); a similar proportion (42%) was observed upon conditional depletion of Ptbp2 in the forebrain (Fig. 3C and SI Appendix, Fig. S15 AD). To determine whether these splicing changes reflect direct regulation by Ptbp2, we correlated Ptbp2-dependent splicing with Ptbp2 binding sites as determined by CLIP (23). Both known and new cassette exons with Ptbp2-dependent exclusion have characteristic enrichment of Ptbp2 CLIP tags in the poly-pyrimidine tract, suggesting that the splicing changes reflect direct regulation by Ptbp2 (Fig. 3D). We also analyzed an RNA-Seq dataset that profiled mouse hippocampi upon depletion of Mbnl2, an RBP implicated in myotonic dystrophy (25). Again, we observed that a sizeable proportion (53 of 223 or 23%) of Mbnl2-dependent exons are newly discovered (SI Appendix, Fig. S15 EG).

Known and New AS Exons Under Strong Evolutionary Selection Pressure.

Because it is currently impractical to directly identify all exons regulated by specific RBPs, we explored an alternative approach to infer functionality by evaluating AS events under evolutionary selection in the mammalian lineage. We scored conservation of the wobble positions of the alternative exons together with flanking intronic sequences in 40 sequenced mammalian species, which achieves high statistical power for detecting selection pressure in mammals while avoiding bias toward either coding or noncoding exons (SI Appendix, Fig. S16, SI Results, and SI Methods).

We initially considered a group of NMD_in exons with conserved AS in human and mouse which have no overlap with any coding sequences. In other words, the function of these exons was maintained simply to induce NMD upon exon inclusion during ∼75 million years of evolution, so they provided a positive control set of functional AS events. As a negative control set, we used exons constitutively spliced in both human and mouse. We found that conservation scores of the wobble positions and intronic sequences together were able to largely separate the positive and negative control exons (Fig. 4A). We then examined the nonredundant subset of known and new cassette exons in mouse with both orthologous sequences and intact splice sites in human. A Gaussian mixture model (GMM) was used to decompose these exons into two populations: one under selection pressure similar to constitutive exons (C1 population) and the other showing distinct signatures of negative or positive selection to maintain AS (C2 population) (Fig. 4B and SI Appendix, SI Methods). We estimate that 7,645 cassette exons (16%) are under selection pressure driven by regulated AS, including 2,287 (30%) known and 5,358 (70%) newly discovered AS exons. These evolutionarily constrained AS events accounts for 17% (2,287 of 13,500) and 8% (5,358 of 64,450) of all nonredundant cassette exons that are previously known or newly discovered, respectively. These observations highlight both the tolerance of AS events without apparent evolutionary fitness, especially among those new AS events, and the prevalence of previously uncharacterized, hidden exons under functional selection.

Fig. 4.

Fig. 4.

Identification of cassette exons under evolutionary selection pressure. (A) Conserved NMD_in exons (orange, positive control of functional AS exons) and constitutive exons (gray, negative control of AS exons) can be largely separated by conservation in the wobble positions of the alternative exon (y axis) and flanking intronic sequences (x axis). (B) Distribution of nonredundant cassette exons with respect to conservation scores as displayed by smooth scatter plot. The distribution can be decomposed into two populations (C1 and C2), with the percentage of exons in each population indicated. The “+” signs indicate the mean conservation scores of each population of exons. The dashed lines indicate two SDs from the means, the threshold used to detect strong purifying selection pressure. The first quadrant, marked with the bold black lines, indicates exons under strong purifying selection. (C) The number of nonredundant known and new cassette exons under strong purifying selection. (D and E) AS of the Rbfox2 gene in the region flanking exon 6 in mouse (D) and human (E). Two highly conserved cryptic NMD_in exons (e5* and e6*) were identified de novo in this region. These exons were recently demonstrated to trigger NMD upon inclusion (30). An additional NMD_in exon (e5#) with a very low level of inclusion (<0.1% compared with the major isoform with inclusion of exon 6) was identified in mouse. (F) Cross-species sequence conservation of e5#. Splice sites and predicted premature termination codons (PTCs) are indicated.

To identify AS events with potentially conserved function in mammals, we used stringent thresholds on conservation scores to define a subset of 3,058 nonredundant, highly conserved cassette exons, including 1,640 (53%) newly discovered cassette exons (Fig. 4 B and C and Table S5, and SI Appendix, SI Methods). These exons are substantially smaller than constitutively spliced exons (85 nt vs. 114 nt, median), a feature known to be associated with regulated AS exons (4). In particular, 272 AS exons (including 117 newly discovered) have a size ≤ 27 nt, and such microexons have been shown recently to be particularly relevant to neurodevelopment (14, 26, 27). Therefore, deep survey of the transcriptome allows us to more than double the number of alternative exons that are likely functional.

For instance, the Rbfox2 gene has a 93-nt cassette exon (denoted e6) encoding part of the RNA-recognition motif (RRM), which can be skipped to produce a dominant negative form (28, 29). Two cryptic exons (e5* and e6*) around e6 were identified recently by detailed analysis of the region, and inclusion of these exons upon overexpression of Rbfox3 was demonstrated to trigger NMD (30). Our genome-wide analysis detected both exons de novo in both human and mouse and correctly annotated them as NMD_in exons, despite the very low inclusion level (<0.5% in each species; Fig. 4 D and E). In addition, we discovered a paralogous exon of e6* in the Rbfox1 gene in both human and mouse (SI Appendix, Fig. S17), suggesting the ancient origin of this exon, before duplication of the Rbfox gene family. Interestingly, another NMD_in exon (e5#) was discovered at an extremely low level in a conserved region between e5 and e5* in mouse Rbfox2 (Fig. 4D). Although conservation of this exon is below the stringent threshold we used to define strong purifying selection, the conserved nature of the splice sites and the presence of two conserved PTCs suggest its potential functional significance (Fig. 4F).

In total, 1,014 cassette exons under strong purifying selection (390 known and 624 new) are expected to trigger NMD upon exon inclusion or exclusion. NMD exons overlapping with highly conserved sequences were previously found in ubiquitous splicing factors such as SR and hnRNP proteins, core spliceosomal proteins, and, in some cases, tissue-specific splicing factors (3133), suggesting an autoregulatory mechanism for homeostatic maintenance of the splicing machinery. Our analysis confirmed and extended these observations by identifying 118 highly conserved NMD exons in 84 RBPs, accounting for 21% of all annotated RBPs (34) (Table S5 and SI Appendix, Figs. S18 and S19 and SI Results). Interestingly, in addition to splicing factors, we identified NMD exons in RBPs involved in other steps of RNA metabolism such as 3′-end processing and translational regulation (Table S6; see examples in SI Appendix, Fig. S19 and SI Results). Therefore, our analysis suggests that regulation of RBP expression through NMD is more widespread than previously recognized.

AS-NMD Exons in Chromatin Regulators and Their Evolutionary Origin.

To investigate the function of highly conserved AS-NMD exons in a more unbiased manner, we performed gene ontology (GO) analysis. This analysis highlighted genes involved in “mRNA metabolic process” (Benjamini FDR < 4.7 × 10−18), as expected. Unexpectedly, a second group of 52 genes encoding chromatin regulators, highly significantly enriched in annotations such as “chromatin organization” and “chromatin modification” (Benjamini FDR < 4.1 × 10−8 and 1.0 × 10−6), was also identified (Tables S7 and S8 and SI Appendix, Figs. S20 and S21 and SI Results).

Finally, we investigated the evolutionary origin of NMD exons under strong selection in mammals. We focused on those NMD_in exons without overlap with protein-coding exons, because these exons may have arisen de novo during evolution. For comparison, we used constitutive exons and highly conserved alternative coding and NMD_ex exons (Fig. 5 AE). Although conservation of a majority of constitutive and NMD_ex exons extends into lower vertebrates (Fig. 5 C and D), suggesting their ancient origin, conservation of de novo NMD_in exons are largely limited to placental mammals (Fig. 5 A and E), and alternative coding exons are in between (Fig. 5 B and E). Intriguingly, there appears to be a transition period during the evolution from ancestors of fish to amphibians and land animals when these NMD_in exons were created, likely derived a fitness advantage, and were fixed in the mammalian lineage (Fig. 5E). Two observations corroborate this finding. First, the size of introns flanking de novo NMD_in exons is roughly half that of those flanking other groups of exons (∼1500 vs. 2500–3000 nt, median; Fig. 5F), which is consistent with the notion that introduction of a de novo NMD_in exon splits an ancestral intron into two. Second, among the 34 exons (from 32 genes) with extended conservation into lower vertebrates, a majority (21 exons) are in genes involved in RNA metabolism, suggesting the ancient origin of the autoregulatory mechanism (Table S9 and SI Appendix, SI Results). In contrast, most, if not all, de novo NMD_in exons in genes currently annotated as chromatin regulators appear to have evolved in ancestors of placental mammals (Fig. 5A), potentially correlated with more sophisticated epigenetic regulation of gene expression in these species.

Fig. 5.

Fig. 5.

Evolutionary origin of highly conserved de novo NMD_in exons. (AD) Cross-species conservation of different categories of cassette exons under strong purifying selection in mammals is shown (AC). Constitutive exons (D) are used as a control. In each panel, the heatmap indicates the conservation of each exon (rows) in each of the 60 vertebrate species (columns) ordered based on the phylogenetic tree (i.e., divergence from mouse, which is shown in the first column; placental mammals and nonmammalian vertebrates are underlined). A gray box in the heatmap indicates the presence of the exon in the species. Note that only exons with sequences conserved in mouse and human were included for the analysis. In A, exons from genes encoding RBPs or chromatin regulators are indicated on the right. (E) The percentage of conserved cassette exons across 60 vertebrate species. The phylogenetic tree of the species is shown on the top. The shaded species indicate the evolutionary period during which de novo NMD_in exons were created and finally fixed in mammals. (F) Median size of the introns flanking cassette exons under strong purifying selection pressure. Exons are grouped by their coding capacity. Error bars represent the robust estimate of the SEM.

Discussion

A first step to understand the complexity of the mammalian transcriptome is to reveal the atlas of AS events and prioritize the events that are likely functional. To our knowledge, this work represents the most comprehensive survey of AS in the mammalian brain in both sequencing depth and the range of developmental stages included, which allows us to expand the list of AS events over an order of magnitude. Based on current knowledge, there is no indication that a majority of the observed AS events are functional, no matter if they were previously identified in cDNAs/ESTs or only detectable by deep sequencing, although the latter is more likely to include leaky splicing products. On the other hand, previous annotations based on cDNAs/ESTs may cover only a minority of all functional events due to the limited depth of the data. Our analysis demonstrated that newly discovered AS exons from deep sequencing account for 23–40% of cassette exons under tissue- or cell type-specific regulation. This proportion is likely an underestimate because new AS exons are more likely found in genes of low abundance, e.g., due to restricted expression in specific cell types or clearance of the transcripts through NMD. Splicing regulation for these exons is more difficult to determine. Indeed, by analysis of evolutionary signature, we estimated that a majority (70%) of cassette exons under selection pressure in mammals were not annotated previously. Among highly conserved cassette exons, newly discovered events are more enriched in NMD targets compared with known events (53% vs. 30%, among exons with assigned categories). Many of these NMD exons reside in RBPs including, but not limited to, splicing factors, implying an extensive network of autoregulation and cross-regulation of RBPs involved in essentially all steps of RNA metabolism. The detected inclusion level of some of these NMD exons (e.g., those in Rbfox1/2 genes; Fig. 4 DF) is extremely low (<1%), and yet the high level of cross-species sequence conservation, independent detection in both mouse and human, and increased abundance upon inhibition of the NMD pathways strongly argue for their functional significance. An intriguing question is whether these exons serve only for fine-tuning homeostatic RBP expression or for regulating expression more dramatically in certain conditions we currently do not know. A potentially revealing experiment, for example, will be transcriptome analysis upon perturbation of tissue-specific splicing regulators in combination with suppression of the NMD pathway.

In addition to a greatly expanded list of AS exon that are likely functional, our study led to an unexpected finding revealing the widespread regulation of chromatin regulators through NMD. The impact of chromatin structure and its modifications on AS has been suggested by a number of recent studies (35). Our analysis implies another side of the coupling between the two processes, which is much less characterized. To our knowledge, this is the first time the general implication of AS-NMD in chromatin regulators has been suggested. In support of this observation, a recent study noted changes in splicing of NMD exons in several genes encoding histone modifying enzymes during terminal erythropoiesis (36). Regulation of chromatin genes through AS-NMD can potentially provide a feedback mechanism that links epigenetic regulation to RNA processing. Importantly, our phylogenetic analysis tracing the evolutionary origin of these NMD exons suggest that this mechanism was likely introduced in the mammalian lineage during evolution, which is a more recent invention than AS as a mechanism for diversifying the proteome.

Materials and Methods

Wild-type (WT) C57BL/6 male mice at nine different ages from E14.5 to 21 mo were used for sample collection. Total RNA was extracted from whole cortices of individual animals, with duplicates at each age. RNA-Seq libraries were prepared using the TruSeq RNA Sample Preparation Kit V2 (Illumina) and PE 2 × 101-nt reads were generated using the Illumina HiSeq 2000. This dataset has been deposited into NCBI Sequence Read Archive (SRP055008).

In total, we analyzed seven RNA-Seq datasets in this study: (i) mouse developing cortex as described above (PE 101-nt reads); (ii) mouse developing frontal cortex (E11 to 22 mo, SE 101-nt reads) (12); (iii) human developing middle/frontal cortex (PE/SE 101-nt reads) (12); (iv) CNS cell types (PE 101-nt reads) (19); (v) Ptbp2 WT vs. KO brains (PE 100-nt reads) (24); (vi) Mbnl2 WT vs. KO hippocampi (PE 40-nt reads) (25); (vii) Upf2 WT vs. KO liver (SE 75-nt reads) (37). In particular, discovery of new exon junctions and AS events in the mouse and human cortex transcriptomes was based on the first two datasets and the third dataset, respectively. A summary of samples in each dataset is provided in Table S1.

Details of bioinformatics analysis, including RNA-Seq read mapping, detection, annotation, and quantification of AS events, and analysis of evolutionary selection pressure, are provided in SI Appendix, SI Methods. Additional information is provided at the complementary website zhanglab.c2b2.columbia.edu/index.php/Cortex_AS.

Supplementary Material

Supplementary File
pnas.1502849112.sd01.xlsx (56.2KB, xlsx)
Supplementary File
Supplementary File
pnas.1502849112.sd02.xlsx (45.4KB, xlsx)
Supplementary File
pnas.1502849112.sd03.xlsx (50.9KB, xlsx)
Supplementary File
pnas.1502849112.sd04.xlsx (34.9KB, xlsx)
Supplementary File
pnas.1502849112.sd05.xlsx (797.7KB, xlsx)
Supplementary File
pnas.1502849112.sd06.xlsx (51.4KB, xlsx)
Supplementary File
pnas.1502849112.sd07.xlsx (55.4KB, xlsx)
Supplementary File
pnas.1502849112.sd08.xlsx (29.6KB, xlsx)
Supplementary File
pnas.1502849112.sd09.xlsx (15.2KB, xlsx)

Acknowledgments

We thank R. B. Darnell for encouragement and M. Herre for technical assistance during the early stages of the project, and Columbia Genome Center for sequencing of RNA-Seq libraries. This study was supported by National Institutes of Health (NIH) Grants R00GM95713 (to C.Z.); R01MH09955501 and R01NS08170301 (to B.A.B.); Simons Foundation Autism Research Initiative Grants 297990 and 307711 (to C.Z.); T32GM007365 with additional support from Stanford School of Medicine and its Medical Scientist Training Program (to S.S.) and a Berry Postdoctoral Fellowship (to Y.Z.).

Footnotes

The authors declare no conflict of interest.

Data deposition: The sequences reported in this paper have been deposited in the Sequence Read Archive (SRA) database (accession no. SRP055008).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1502849112/-/DCSupplemental.

References

  • 1.Nilsen TW, Graveley BR. Expansion of the eukaryotic proteome by alternative splicing. Nature. 2010;463(7280):457–463. doi: 10.1038/nature08909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ule J, Darnell RB. RNA binding proteins and the regulation of neuronal synaptic plasticity. Curr Opin Neurobiol. 2006;16(1):102–110. doi: 10.1016/j.conb.2006.01.003. [DOI] [PubMed] [Google Scholar]
  • 3.Blencowe BJ. Alternative splicing: New insights from global analyses. Cell. 2006;126(1):37–47. doi: 10.1016/j.cell.2006.06.023. [DOI] [PubMed] [Google Scholar]
  • 4.Xing Y, Lee C. Alternative splicing and RNA selection pressure—evolutionary consequences for eukaryotic genomes. Nat Rev Genet. 2006;7(7):499–509. doi: 10.1038/nrg1896. [DOI] [PubMed] [Google Scholar]
  • 5.Kan Z, States D, Gish W. Selecting for functional alternative splices in ESTs. Genome Res. 2002;12(12):1837–1845. doi: 10.1101/gr.764102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Modrek B, Lee CJ. Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat Genet. 2003;34(2):177–180. doi: 10.1038/ng1159. [DOI] [PubMed] [Google Scholar]
  • 7.Maquat LE. Nonsense-mediated mRNA decay: Splicing, translation and mRNP dynamics. Nat Rev Mol Cell Biol. 2004;5(2):89–99. doi: 10.1038/nrm1310. [DOI] [PubMed] [Google Scholar]
  • 8.Castle JC, et al. Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat Genet. 2008;40(12):1416–1425. doi: 10.1038/ng.264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sugnet CW, et al. Unusual intron conservation near tissue-regulated exons found by splicing microarrays. PLOS Comput Biol. 2006;2(1):e4. doi: 10.1371/journal.pcbi.0020004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wang ET, et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456(7221):470–476. doi: 10.1038/nature07509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008;40(12):1413–1415. doi: 10.1038/ng.259. [DOI] [PubMed] [Google Scholar]
  • 12.Lister R, et al. Global epigenomic reconfiguration during mammalian brain development. Science. 2013;341(6146):1237905. doi: 10.1126/science.1237905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Farrell CM, et al. Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res. 2014;42(Database issue):D865–D872. doi: 10.1093/nar/gkt1059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wu J, Anczuków O, Krainer AR, Zhang MQ, Zhang C. OLego: Fast and sensitive mapping of spliced mRNA-Seq reads using small seeds. Nucleic Acids Res. 2013;41(10):5149–5163. doi: 10.1093/nar/gkt216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Treutlein B, Gokce O, Quake SR, Südhof TC. Cartography of neurexin alternative splicing mapped by single-molecule long-read mRNA sequencing. Proc Natl Acad Sci USA. 2014;111(13):E1291–E1299. doi: 10.1073/pnas.1403244111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Carninci P, et al. FANTOM Consortium RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group) The transcriptional landscape of the mammalian genome. Science. 2005;309(5740):1559–1563. doi: 10.1126/science.1112014. [DOI] [PubMed] [Google Scholar]
  • 17.Weyn-Vanhentenryck SM, et al. HITS-CLIP and integrative modeling define the Rbfox splicing-regulatory network linked to brain development and autism. Cell Reports. 2014;6(6):1139–1152. doi: 10.1016/j.celrep.2014.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Merkin J, Russell C, Chen P, Burge CB. Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science. 2012;338(6114):1593–1599. doi: 10.1126/science.1228186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zhang Y, et al. An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. J Neurosci. 2014;34(36):11929–11947. doi: 10.1523/JNEUROSCI.1860-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ule J, et al. An RNA map predicting Nova-dependent splicing regulation. Nature. 2006;444(7119):580–586. doi: 10.1038/nature05304. [DOI] [PubMed] [Google Scholar]
  • 21.Zhang C, et al. Defining the regulatory network of the tissue-specific splicing factors Fox-1 and Fox-2. Genes Dev. 2008;22(18):2550–2563. doi: 10.1101/gad.1703108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhang C, et al. Integrative modeling defines the Nova splicing-regulatory network and its combinatorial controls. Science. 2010;329(5990):439–443. doi: 10.1126/science.1191150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Licatalosi DD, et al. Ptbp2 represses adult-specific splicing to regulate the generation of neuronal precursors in the embryonic brain. Genes Dev. 2012;26(14):1626–1642. doi: 10.1101/gad.191338.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Li Q, et al. The splicing regulator PTBP2 controls a program of embryonic splicing required for neuronal maturation. eLife. 2014;3:e01201. doi: 10.7554/eLife.01201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Charizanis K, et al. Muscleblind-like 2-mediated alternative splicing in the developing brain and dysregulation in myotonic dystrophy. Neuron. 2012;75(3):437–450. doi: 10.1016/j.neuron.2012.05.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Li YI, Sanchez-Pulido L, Haerty W, Ponting CP. RBFOX and PTBP1 proteins regulate the alternative splicing of micro-exons in human brain transcripts. Genome Res. 2015;25(1):1–13. doi: 10.1101/gr.181990.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Irimia M, et al. A highly conserved program of neuronal microexons is misregulated in autistic brains. Cell. 2014;159(7):1511–1523. doi: 10.1016/j.cell.2014.11.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Damianov A, Black DL. Autoregulation of Fox protein expression to produce dominant negative splicing factors. RNA. 2010;16(2):405–416. doi: 10.1261/rna.1838210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Baraniak AP, Chen JR, Garcia-Blanco MA. Fox-2 mediates epithelial cell-specific fibroblast growth factor receptor 2 exon choice. Mol Cell Biol. 2006;26(4):1209–1222. doi: 10.1128/MCB.26.4.1209-1222.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Dredge BK, Jensen KB. NeuN/Rbfox3 nuclear and cytoplasmic isoforms differentially regulate alternative splicing and nonsense-mediated decay of Rbfox2. PLoS ONE. 2011;6(6):e21585. doi: 10.1371/journal.pone.0021585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ni JZ, et al. Ultraconserved elements are associated with homeostatic control of splicing regulators by alternative splicing and nonsense-mediated decay. Genes Dev. 2007;21(6):708–718. doi: 10.1101/gad.1525507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lareau LF, Inada M, Green RE, Wengrod JC, Brenner SE. Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature. 2007;446(7138):926–929. doi: 10.1038/nature05676. [DOI] [PubMed] [Google Scholar]
  • 33.Saltzman AL, et al. 2008. Regulation of multiple core spliceosomal proteins by alternative splicing-coupled nonsense-mediated mRNA decay. Mol Cell Biol MCB.00361-00308.
  • 34.Cook KB, Kazan H, Zuberi K, Morris Q, Hughes TR. RBPDB: A database of RNA-binding specificities. Nucleic Acids Res. 2011;39(Database issue) suppl 1:D301–D308. doi: 10.1093/nar/gkq1069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Luco RF, Allo M, Schor IE, Kornblihtt AR, Misteli T. Epigenetics in alternative pre-mRNA splicing. Cell. 2011;144(1):16–26. doi: 10.1016/j.cell.2010.11.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Pimentel H, et al. A dynamic alternative splicing program regulates gene expression during terminal erythropoiesis. Nucleic Acids Res. 2014;42(6):4031–4042. doi: 10.1093/nar/gkt1388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Weischenfeldt J, et al. Mammalian tissues defective in nonsense-mediated mRNA decay display highly aberrant splicing patterns. Genome Biol. 2012;13(5):R35. doi: 10.1186/gb-2012-13-5-r35. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1502849112.sd01.xlsx (56.2KB, xlsx)
Supplementary File
Supplementary File
pnas.1502849112.sd02.xlsx (45.4KB, xlsx)
Supplementary File
pnas.1502849112.sd03.xlsx (50.9KB, xlsx)
Supplementary File
pnas.1502849112.sd04.xlsx (34.9KB, xlsx)
Supplementary File
pnas.1502849112.sd05.xlsx (797.7KB, xlsx)
Supplementary File
pnas.1502849112.sd06.xlsx (51.4KB, xlsx)
Supplementary File
pnas.1502849112.sd07.xlsx (55.4KB, xlsx)
Supplementary File
pnas.1502849112.sd08.xlsx (29.6KB, xlsx)
Supplementary File
pnas.1502849112.sd09.xlsx (15.2KB, xlsx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES