Abstract
Long noncoding RNAs (lncRNAs) that map to intragenic regions of the human genome with the same (intronic lncRNAs) or opposite orientation (antisense lncRNAs) relative to protein-coding mRNAs have been largely dismissed from biochemical and functional characterization due to the belief that they are mRNA precursors, byproducts of RNA splicing or simply transcriptional noise. In this work, we used a custom microarray to investigate aspects of the biogenesis, processing, stability, evolutionary conservation, and cellular localization of ∼6,000 intronic lncRNAs and ∼10,000 antisense lncRNAs. Most intronic (2,903 of 3,427, 85%) and antisense lncRNAs (4,945 of 5,214, 95%) expressed in HeLa cells showed evidence of 5′ cap modification, compatible with their transcription by RNAP II. Antisense lncRNAs (median t1/2 = 3.9 h) were significantly (p < 0.0001) more stable than mRNAs (median t1/2 = 3.2 h), whereas intronic lncRNAs (median t1/2 = 2.1 h) comprised a more heterogeneous class that included both stable (t1/2 > 3 h) and unstable (t1/2 < 1 h) transcripts. Intragenic lncRNAs display evidence of evolutionary conservation, have little/no coding potential and were ubiquitously detected in the cytoplasm. Notably, a fraction of the intronic and antisense lncRNAs (13 and 15%, respectively) were expressed from loci at which the corresponding host mRNA was not detected. The abundances of a subset of intronic/antisense lncRNAs were correlated (r ≥ |0.8|) with those of genes encoding proteins involved in cell division and DNA replication. Taken together, the findings of this study contribute novel biochemical and genomic information regarding intronic and antisense lncRNAs, supporting the notion that these classes include independently transcribed RNAs with potentials for exerting regulatory functions in the cell.
Keywords: antisense lncRNAs, eukaryotic transcription, Intronic lncRNAs, RNA stability, RNA subcellular localization
Introduction
The advent of high-throughput analyses of the human genome has led to the discovery that most of the genome (80%) is transcribed into RNAs, whereas less than 2% is translated into proteins.1 Several classes of noncoding RNAs have been identified that are transcribed from intergenic or intragenic genomic regions and that can be operationally classified by length as short (<200 nt) or long noncoding RNAs (>200 nt; lncRNAs).2 LncRNAs can be spliced or unspliced, and their expression can be tissue- and cell-type specific.3,4 Some lncRNAs show distinct subcellular localization patterns3,5,6 and have been associated with different diseases.2,7 An increasing number of studies have reported that lncRNAs have significant functions, not only at the transcriptional level but also post-transcriptionally, regulating alternative splicing, nuclear trafficking, mRNA stability and translation.8-11
Given the importance of lncRNAs in the regulation of different cellular processes, the control of lncRNA metabolism itself is likely critical for modulating its cellular function. The equilibrium between the rates of RNA synthesis and degradation determines RNA abundance in the steady state;12,13 however, neither of these processes is fully understood for lncRNAs. A recent study has shown that DCP2 plays a role in lncRNA decay, independent of all known regulators of this decapping holoenzyme, representing a unique pathway for lncRNA turnover.14 Similarly, while capped and polyadenylated mRNAs are transcribed by RNA polymerase II (RNAP II), different studies have shown that there is a fraction of RNAP II-transcribed lncRNAs that may lack a 5′ cap structure, lack a poly-A tail or undergo an alternative method of 3′-end processing.15-17
While much recent attention has focused on intergenic lncRNAs (lincRNAs), intragenic noncoding transcripts are also widely expressed in mammalian cells. Intronic regions cover over 20% of the human genome, producing a variety of regulatory RNA transcripts, including small nucleolar RNAs (snoRNAs), microRNAs and diverse types of long noncoding RNAs, such as ciRNAs, antisense lncRNAs and intronic lncRNAs (sense transcripts).18-20 Previous work in our laboratory involving analysis of unspliced noncoding ESTs in public databases has revealed that at least 74% of the annotated protein-coding loci in the human genome accumulate noncoding transcripts in the sense or antisense orientation relative to the mRNA.19 A recent report has provided additional evidence that intragenic RNAs constitute the major component of the mammalian ncRNA transcriptome,21 corroborating the notion that lncRNAs spanning intronic regions are more than just disposable parts of pre-mRNAs.21,22
In the present study, we used a custom oligoarray platform to investigate aspects of the biogenesis, stability and subcellular localization of 6,322 intronic and 10,482 antisense monoexonic lncRNAs that map to 5,411 protein-coding loci. We have shown that the majority (85–95%) of these transcripts are synthesized by RNAP II and are 5′-capped, suggesting the occurrence of independent transcriptional events. The presence of a 5′ cap contributes positively to lncRNA stability, and the small fraction (∼15%) of intronic lncRNAs that are devoid of a 5′ cap are less stable (half-life of < 3 h), possibly comprising intron lariats that accumulate in cells. Despite the lack of coding potential, the majority of intronic and antisense lncRNAs have been detected in both the nucleus and cytoplasm, pointing to potential new roles of these transcripts in the regulation and organization of cytoplasmic processes.
Results
Antisense and intronic lncRNAs are transcribed by RNA Pol II and undergo 5′ cap modification
To investigate the biogenesis of monoexonic intronic and antisense lncRNAs, we treated HeLa cells with an RNAP II inhibitor, α-amanitin, for 9 h and evaluated the effect of the blocking of transcription on their expression levels (Fig. S1). We used a re-annotated version of a previously described custom-designed 44k intron-exon oligoarray19 that interrogates 6,322 intragenic transcripts with the same orientation as the respective mRNAs (intronic lncRNAs), as well as 10,482 transcripts with an antisense orientation relative to the mRNAs (antisense lncRNAs) and 6,360 protein-coding mRNAs (Fig. 1A) (see Methods for details). We noted that only a small fraction of the intronic (1.1 and 2.4%, respectively) and antisense (1.9 and 9.4%, respectively) lncRNAs present in the oligoarray matched those in the GENCODE3 and NONCODE23 catalogs (Table S3). Despite the limited overlaps with these lncRNA catalogs, it is interesting to note that ∼90% of the intronic and antisense lncRNAs present in our microarray were also detected with coverage of ≥ 70% in at least one cell line in the strand-specific ENCODE/CSHL long RNAseq dataset 24 (Fig. S2).
Figure 1.
Stability and sub-cellular localization of lncRNAs and mRNAs. (A) Schematic representation of the intronic (red) and antisense (blue) lncRNAs and protein-coding mRNAs (green) present in intron-exon oligoarray. (B) The distribution of half-lives of the lncRNAs and mRNAs following transcription inhibition with actinomycin D at different time points (1, 3, 6, and 8 h). (C) Aliquots of total RNA were digested with 5′-phosphate-dependent exonuclease (5′-exo) with or without tobacco acid pyrophosphatase (TAP) pre-treatment to release the 5′ cap and render the RNA susceptible to 5′ exonuclease digestion. The numbers of intronic lncRNAs, antisense lncRNAs and mRNAs with or without a 5′ cap (y-axis) were determined by identifying those that were significantly affected by digestion with both enzymes (TAP+/5′-exo+) compared with controls digested with 5′-exo only (TAP−/5′-exo+), and these numbers were plotted according to RNA stability (x-axis). The data reflect results obtained from 4 independent replicates of 5′ cap assay. (***) Fisher's test, p < 0.0002. (D) Relative distributions of lncRNAs and mRNAs in nuclear (Nc) and cytoplasmic (Cyt) fractions of HeLa cells, as determined by 2 independent replicate measurements.
Similar to the protein-coding mRNAs, all intronic and antisense lncRNAs showed significantly reduced expression levels following α-amanitin treatment (SAM test, q-value threshold ≤0.1%) (Fig. S1B), confirming that these lncRNAs were transcribed by RNAP II, as previously reported.19 Furthermore, using publicly available RNAP II ChIP-seq data (see Methods for details), we observed the significant enrichment of RNAP II binding at the predicted transcription start site (TSS) of a fraction of the intronic and antisense lncRNAs compared with control sets of random sequences with similar sizes, lengths and genomic contexts (Kolmogorov-Smirnov (KS) test, p < 0.05) (Fig. S3A). For a comparison, we performed the same analysis using well-annotated intragenic (intronic and antisense) and intergenic lncRNAs from the GENCODE catalog and obtained comparable results (Fig. S3B).
The m7-GpppN cap modification that characterizes the 5′ end of RNAP II-transcribed RNAs is important to promote efficient pre-mRNA processing, mRNA export and mRNA translation. Moreover, the association of the 5′ cap with cap-binding proteins stabilizes and protects pre-mRNAs and mRNAs against 5′-to-3′ exonucleolytic activities.25 To investigate the presence of a 5′ cap in intragenic lncRNAs, total RNA aliquots were digested with 5′-phosphate-dependent exonuclease (5′-exo) following pre-treatment with tobacco acid pyrophosphatase (TAP) to release the 5′ cap and render the RNA susceptible to 5′ exonuclease digestion (TAP+/5′-exo+). A parallel reaction was performed without pre-treatment with TAP (TAP−/5′-exo+). The fraction of 5′-capped transcripts was estimated by calculating the ratio of TAP+/5′-exo+/TAP−/5′-exo+. The efficiency of the enzymatic treatments was confirmed by measuring the presence of 5′ cap in capped (c-Myc and α tubulin) and uncapped (snRNA U15A) transcripts by qPCR (Fig. S4A, B, respectively). Next, TAP+/5′-exo+- and TAP−/5′-exo+-treated RNA aliquots were hybridized to a custom intron-exon 44k oligoarray. As expected, we observed that nearly all of the protein-coding mRNAs (5,546 of 5,621, 99%) were significantly affected by the TAP+/5′-exo+ treatment, exhibiting decreased abundance relative to the TAP−/5′-exo+ control (SAM, q-value ≤ 0.1%). Using the same criterion, we determined that a 5′ cap was present in almost all of the antisense lncRNAs (4,945 of 5,214, 95%) and in most of the intronic lncRNAs (2,903 of 3,427, 85%). The higher percentage (15%) of uncapped transcripts among the intronic lncRNAs compared with the antisense lncRNAs suggests the presence of intron lariats within this set. Randomly selected candidates were tested by qPCR to confirm the array data (Fig. S4C). CAGE tag analysis using public data confirmed the statistically significant enrichment of 5′ cap in the predicted TSSs of the intragenic lncRNAs compared with control random sequences (KS test, p < 0.05) (Fig. S3C). Similar results were obtained for intragenic and intergenic lncRNAs from GENCODE (Fig. S3D).
Stability of intronic and antisense lncRNAs
LncRNAs from intragenic regions of protein-coding loci, especially those with the same orientation as the respective mRNA, may represent pre-mRNAs or intron lariats, which are expected to have a short turnover compared with processed RNAs that accumulate in cells. To evaluate the stability of intragenic lncRNAs relative to protein-coding mRNAs, we blocked transcription with actinomycin D in HeLa cells in culture and measured the changes in RNA levels after 1, 3, 6 and 8 h (Fig. S5A). Valid half-life measurements were calculated for a total of 5,690 transcripts, including 791 intronic and 695 antisense lncRNAs and 4,204 mRNAs (Fig. S5A). The half-lives of randomly selected intronic and antisense lncRNAs were independently measured by qPCR to validate the array data (Fig. S6). Notably, only transcripts that were expressed at levels of > 2-fold above the background level were used in half-life analysis to avoid the misinterpretation of microarray measurements because small changes in intensity after transcription blocking may be artifactually considered as evidence of high stability.26 We also normalized the data to correct for bias introduced by the apparent relative increases in the abundances of the most stable RNAs during the actinomycin D treatment 26,27 (see Methods for a detailed description of data processing). After scaling the intensities of all transcripts at each time point by the corresponding normalization factor, we calculated the half-lives of 5,874 transcripts using 2 different decay models, one-phase exponential decay and linear regression.27 Next, for each transcript, we selected the decay model that returned the best fit and applied a R2 > 0.7 cutoff value, which resulted in 5,690 valid transcript measurements, as shown in Supplemental Figure 5A.
We then compared the stabilities of the intronic and antisense lncRNAs relative to the mRNAs (Fig. 1B; Fig. S5B). Interestingly, the antisense lncRNAs showed greater stability than the mRNAs (median t1/2 = 3.9 h vs. 3.2 h, respectively; one-way ANOVA p < 0.0001) (Fig. 1B; Fig. S5B). The intronic lncRNAs (median t1/2 = 2.1 h) were less stable than the mRNAs and antisense lncRNAs (one-way ANOVA p < 0.0001) (Fig. 1B and Figure S5B). Notably, we observed a bimodal distribution of intronic lncRNA half-lives, with a subset showing greater stability, comparable to those of the antisense lncRNAs and mRNAs (Fig. 1B). These results suggest that intronic RNAs constitute a more heterogeneous class of lncRNAs, possibly comprising bona fide lncRNAs transcribed independently from and in the same orientation as mRNAs at the same locus, as well as unprocessed pre-mRNA fragments and spliced intron lariats. In fact, almost all uncapped intronic lncRNAs are also short-lived (half-life < 3 h) (Fig. 1C), corroborating the notion that this set comprises intron lariats that are not stabilized by 5′ cap modification.
To confirm that the high stability observed for the intragenic lncRNAs was not an artifact caused by their low expression levels, we investigated the association between RNA abundance and half-life (Fig. S7) and observed a low but positive correlation (r ≤ 0.26; p < 0.0001) between them, disfavoring the possibility that the low lncRNA expression levels accounted for the apparently high stability of these transcripts.
As an additional control to evaluate the biological robustness of our RNA stability data, we analyzed the over-representation of specific Gene Ontology (GO) terms assigned to the 4,204 mRNAs for which valid half-life measurements were calculated and correlated them with the observed transcript stabilities. We found that transcripts encoded by genes associated with “regulation of cell cycle progression” and “ubiquitin ligase complex” had median half-lives that were significantly (p < 0.05) shorter than those of all of the mRNAs (median t1/2 < 3.2 h), whereas the mRNAs of genes with the GO terms “respiratory electron transport chain” and “integral to ER membrane” were significantly more stable (median t1/2 >3.2 h) (Fig. S8). We also analyzed the enrichment of GO categories among the stable mRNAs (t1/2 ≥ 3 h; n = 2,680) and short-lived mRNAs (t1/2< 3 h; n = 1,524) (Table S1). We observed that the functional categories related to DNA transcription and replication were over represented among the short-lived mRNAs, while the categories related to the mitochondrion and endoplasmic reticulum were associated with the long-lived mRNAs, consistent with findings from published studies.26-30
Subcellular localization of intronic and antisense lncRNAs
To investigate the subcellular distribution of lncRNAs, we hybridized a custom-designed 44 k oligoarray with total RNA isolated from nuclear or cytoplasmic fractions prepared from HeLa cells. The purity of the subcellular fractions was confirmed by detection of nuclear (32S rRNA precursor and histone H3 protein) and cytoplasmic (GAPDH protein) markers (Fig. S9).
The fractions of transcripts from each class (mRNA, intronic and antisense lncRNAs) detected in each compartment (nucleus, cytoplasm, and both) are displayed in Figure 1D. Forty-one percent of the intronic lncRNAs and 25% of the antisense lncRNAs were detected exclusively in the nuclei of HeLa cells. In fact, several reports have demonstrated the involvement of lncRNAs in nuclear functions, such as the control of the epigenetic state of the cell, alternative splicing and RNA polymerase binding efficiency.8,9 On the other hand, it is interesting to note that the majority of the lncRNAs (58% and 73% of the intronic and antisense lncRNAs, respectively) were also detected in the cytoplasm. This finding is in agreement with previous reports of lncRNAs with cytosolic functions related to protein localization, translation and RNA stability.31-34 We also noted that a considerable fraction of mRNAs (16%) were detected exclusively in the nucleus (Fig. 1D). These mRNAs may include transcripts with a very rapid cytoplasmic turnover, those that are inefficiently transported to the cytoplasm, or those that are selectively retained in the nucleus as part of a post-transcriptional mechanism to regulate gene expression.35-37
To gain insight into the stability of lncRNAs detected in different subcellular compartments, we compared the half-lives of the nuclear-enriched transcripts (fold-enrichment ≥ 3, q-value < 0.1%) with those of the transcripts detected at roughly equal levels in both the nucleus and cytoplasm (Fig. S10). We observed that the nuclear-enriched intronic/antisense lncRNAs were significantly more unstable (p < 0.0001) compared with the transcripts detected in both compartments (Fig. S10A–D). The greater instability of the nuclear lncRNAs is in agreement with previous observations by Clark and collaborators.27 We observed that the nuclear-enriched mRNAs showed a less pronounced but similar pattern of stability according to subcellular localization compared with the intragenic lncRNAs (Fig. S10E, F).
Intronic and antisense lncRNA detection across cell lineages reveals cell type-specific expression pattern compared with that of mRNAs
Several lncRNAs exhibit specific patterns of expression according to the cell and tissue type and during organism development.4,5,38,39 To investigate the tissue expression patterns of the intronic and antisense lncRNAs, we compared their expression profiles with those of protein-coding mRNAs measured in the following 3 cell lineages in addition to HeLa cells: MIA PaCa-2 (pancreas), DU-145 (prostate) and MCF-7 (breast) (Fig. 2A–C). While 78% (4,568/5,866) of the mRNAs were detected in all 4 cell types (Fig. 2C), only 42% of the intronic (1,655/3,937) lncRNAs (Fig. 2A) or the antisense (2,467/5,889) lncRNAs (Fig. 2B) were expressed in all 4 cell types, indicating a higher cell type-specific expression of lncRNAs compared with mRNAs. On average, expression levels of the intronic (1,051 a.u., arbitrary units of fluorescence intensity) and antisense (657 a.u.) lncRNAs were lower than those of the protein-coding mRNAs (5,624 a.u.), in agreement with previous reports.3,19,21,40 Using a fractional expression level (FEL) threshold of greater than 50% to consider a transcript as displaying cell type-specific expression (see Methods for details), we verified that 52% of the intronic and 47% of the antisense lncRNAs presented cell type-specific profiles, while a smaller fraction of the mRNAs (33%) presented this pattern (chi-square test; p < 0.0001) (Fig. 2D; Fig. S11).
Figure 2.

Cell type specificity of lncRNAs and mRNAs. Venn diagrams showing the expression overlap of (A) intronic lncRNAs, (B) antisense lncRNAs, and (C) mRNAs detected in each of the 4 cell lineages Mia PaCa 2, DU-145, MCF-7, and HeLa. (D) The fractional expression levels (FELs) of the lncRNAs (intronic, n = 3,937; antisense, n = 5,889) and mRNAs (n = 5,866) across the 4 cell lineages were calculated (see Methods for details). The percentages of transcripts with an FEL higher or lower than 0.5 in each class are shown. Higher FEL values correspond with a more highly cell-specific expression pattern. The observed fractions of cell-specific intronic and antisense lncRNAs are significantly different compared with those of the mRNAs. (***) Chi-square test, p < 0.0001. The data reflect results obtained from 2 independent replicates.
Among the 3,477 intronic and 5,572 antisense lncRNAs found to be expressed in at least one of the cell lineages and for which the host protein-coding gene was also present in the oligoarray, we observed that between 7 to 13% and 8 to 14% of the intronic and antisense lncRNAs, respectively, were expressed from loci that did not express the corresponding host mRNA in the same cell lineage (Table 1). It has been previously proposed that subsets of intronic and antisense lncRNAs exhibit specific and independent expression profiles relative to their host protein-coding genes 5,41
Table 1.
Protein-coding gene loci accumulating only intragenic lncRNAs in human cell lines. Number of gene loci expressing intronic or antisense lncRNAs, but not protein-coding mRNAs, was detected
| Number of loci expressing only lncRNAs |
||
|---|---|---|
| Cell line | Intronic (% of the total) | Antisense (% of the total) |
| Mia PaCa 2 | 229 (13.0) | 414 (14.4) |
| DU-145 | 212 (9.0) | 399 (9.4) |
| MCF-7 | 222 (9.2) | 411 (10.1) |
| HeLa | 203 (6.8) | 373 (8.4) |
To gain further insight into the biogenesis of intronic lncRNAs and to estimate the relative presence of bona fide transcripts and splicing lariats within this population, we compared the expression levels, half-lives and presence or absence of 5′ cap for intronic lncRNAs expressed from gene loci accumulating only lncRNAs versus those co-expressed with mRNAs (Fig. S12). Interestingly, intronic transcripts that were not co-expressed with mRNAs at the same locus in HeLa cells had lower expression levels compared with the levels of those that were co-expressed (median 66 a.u. vs. 82 a.u., respectively; Mann-Whitney test, p < 0.05), in addition to longer half-lives (median 4.1 h versus 2.1 h, respectively; Mann-Whitney test, p < 0.0001) and a prevalence of 5′ cap (94% (167/178) vs. 85% (2,114/2,491), respectively; chi-square test, p < 0.002). Our results further support the notion that lncRNAs produced in intronic regions are neither mere byproducts resulting from spurious RNAP II transcription in the surrounding open chromatin of active genes nor the consequence of persevering lariats.
Intronic and antisense lncRNAs are enriched in genomic marks that suggest independent transcriptional units
Using publicly available ENCODE ChIP-seq data,1,42 we evaluated the presence of the promoter-associated H3K4me3 mark in the vicinity of the TSSs of the intragenic lncRNAs detected in the 4 cell lines evaluated in this study (Fig. 3A). We also verified the frequencies of histone marks associated with poised (H3K4me1) and transcriptionally active (H3K27Ac) enhancers (Fig. 3B, C, respectively). The observed cumulative percentage of chromatin marks located up to 2kb of the annotated TSS of intronic or antisense lncRNAs were 33% and 29% (H3K4me3), 77% and 73% (H3K4me1) and 60% and 55% (H3K27Ac), respectively. The signals from these 3 histone marks were lower compared with those of the mRNAs (Fig. 3A–C), but were were significantly different from those of random sequences (KS test, p < 0.05), indicating that intragenic lncRNA loci exhibit chromatin features typical of independent RNAP II transcriptional units. For a comparison, we performed the same analysis with lncRNAs annotated in the GENCODE catalog as monoexonic intronic, monoexonic antisense and intergenic transcripts (Fig. S13A–C) and obtained similar results.
Figure 3.
Association of intronic and antisense lncRNA loci with regulatory chromatin marks. Distance distributions of (A) H3K4me3, (B) H3K4me1, (C) H3K27Ac, and (D) CpG islands within a 10 kb window relative to the genomic coordinates of the predicted TSSs of intronic (red line) and antisense (blue line) lncRNAs and mRNAs (green line) expressed in at least one of the 4 cell lineages studied. Ten random sets of sequences with the same numbers, lengths and genomic contexts were used as negative controls (gray lines, see Methods for details). The mRNA and intronic and antisense lncRNA distributions were significantly different relative to the average distributions for the random control sequences (Kolmogorov-Smirnov test, p < 0.005).
CpG island hypermethylation is a well-established modification associated with epigenetic silencing of protein-coding genes; however, little is known about the roles of epigenetic modifications in lncRNA transcriptional regulation. We investigated the effects of DNA demethylation on the expression of intronic and antisense lncRNAs in 3 tumor cell lines (Mia PaCa 2, DU-145 and MCF-7) using a demethylating agent, 5-aza-2′-deoxycitidine (5-Aza). The efficacy of the 5-Aza treatment was monitored by measuring the re-expression of NPTX2, GAGE 2A and GSTP1 in Mia PaCa 2, DU-145 and MCF-7 cells, respectively. These genes have been previously identified as hypermethylated in these cell lines 43-45 (Fig. S14A–C). Oligoarray hybridization assay revealed that 5–20% of the mRNAs were significantly up-regulated with the 5-Aza treatment while only 2–8% of the intronic and 1.5–9% of the antisense lncRNAs were upregulated in Mia PaCa 2, DU-145 and MCF-7 cells (Table 2). Selected candidates were examined by RT followed by end-point PCR to confirm the array data (Fig. S14D). 5-Aza treatment had a minimal impact on the global gene expression of both the protein-coding genes and lncRNAs; however, its effect on the lncRNAs was even lower, indicating that only a minority of the lncRNAs were epigenetically regulated by DNA hypermethylation. Corroborating this view, informatics analysis showed that only a small fraction of the interrogated lncRNAs contained CpG islands in the vicinity of their transcription start sites (Fig. 3D). For a comparison, we performed the same analysis with the lncRNAs annotated in the GENCODE catalog as monoexonic intronic, monoexonic antisense and intergenic transcripts (Fig. S13D) and obtained similar results.
Table 2.
Effect of DNA demethylation on the transcription of intragenic lncRNAs. The human cell lines Mia PaCa 2, DU-145 and MCF-7 were treated with demethylating 5-Aza or with vehicle (control) for 96 h, 72 h and 96 h, respectively. The number of protein-coding mRNAs and lncRNAs that were differentially expressed between the 5-Aza-treated and untreated control cells at a q-value of <10% (SAM analysis) are shown
| Number of transcripts with increased expression by 5-Aza |
|||
|---|---|---|---|
| Cell lineage | Intronic (% of the total) | Antisense (% of the total) | mRNA (% of the total) |
| Mia PaCa 2 | 165 (7.8) | 293 (8.9) | 1,049 (20.0) |
| DU-145 | 56 (2.0) | 72 (1.5) | 266 (4.8) |
| MCF-7 | 117 (4.3) | 90 (2.1) | 432 (8.0) |
Intronic and antisense lncRNAs display evidence of evolutionary conservation
We used a catalog of vertebrate transcripts syntenically mapped to the human genome 46 to estimate the degree of evolutionary conservation at the expression level of the set of intragenic lncRNAs under investigation. We identified that 13% and 34% of the intronic (n = 512) and antisense (n = 1,988) lncRNAs, respectively, detected in human cells map to conserved genomic regions that are transcribed in at least one other vertebrate species among the 15 different species interrogated (Fig. 4A, E). This result was statistically significant compared with control data sets of randomly selected sequences (Chi-square test, p < 0.01). For comparison, we performed the same analysis using lncRNAs annotated in GENCODE, and found a comparable proportion of conservation for these well-annotated intronic (16%), antisense (29%) and intergenic (25%) lncRNAs (Figs. S15–17). Conceivably, evolutionary distances across species are not reflected in the TransMap analysis due to different levels of transcriptome annotation in each organism (Fig. 4A, E). We also evaluated the expression conservation of homologous lncRNAs based on sequence similarity of ESTs (see Methods for details) and as expected found a greater expression conservation of human intragenic lncRNAs with primates, followed by mammals and vertebrates (Fig. 4B, F). Comparable results were observed using GENCODE intragenic and intergenic lncRNAs (Figs. S15–17). We also searched for evidence of lncRNA conservation at the DNA sequence level. We found that an excess of conserved DNA elements annotated in vertebrates, mammals and primates overlapped intronic and antisense lncRNAs compared with randomly selected sequences (chi-square test, p < 0.0001) (Fig. 4C, G). For lncRNAs annotated in GENCODE, only the monoexonic antisense lncRNAs showed evidence of conservation across all groups (Fig. S16C). For ENCODE lincRNAs only the mammals and vertebrates datasets showed statistical significance compared to random groups (Fig. S17C). No significant enrichment of conserved elements was observed for GENCODE monoexonic intronic transcripts (Fig. S15C). It is interesting to note that we found for our set of intragenic lncRNAs (Fig. S18), a greater level of sequence conservation around the annotated TSS (up to 500 nt upstream or downstream) as measured by their PhastCon Scores compared to control random sequence sets (Wilcoxon-test, p < 0.0001). Similar results were obtained for GENCODE monoexonic intragenic lncRNAs and lincRNAs (Fig. S19). This could indicate the presence of purifying selective pressure on regulatory regions of lncRNAs.
Figure 4.

Conservation analysis of intronic and antisense lncRNAs. (A and E) TransMap cross-species cDNA alignments of 15 vertebrate species with the human genome (species indicated in the rows) overlapping (A) 512 (13% of total) intronic lncRNAs and (E) 1,988 (34% of total) antisense lncRNAs expressed in at least one cell line investigated in this study and with conserved expression in at least one species (red and blue dashes show expression conservation). Higher proportions of expression conservation of the intronic (A) and antisense (B) lncRNA datasets were found compared with 10 control data sets of randomly selected sequences with the same numbers, transcript lengths and genomic context (Chi-square test, p < 0.01). (B and F) Number of human intronic (B) or antisense (F) lncRNAs with sequence similarity to ESTs from selected vertebrate, mammalian and primate species. The same number of randomly selected ESTs from each species was used to avoid sampling bias. (C and G) DNA sequence conservation of intragenic lncRNAs within vertebrate, mammal and primate groups. The asterisks show statistically significant differences in the numbers of conserved PhastCons elements overlapping intronic (C) and antisense (G) lncRNAs compared with 10 sets of randomly selected genomic regions with same numbers, lengths and genomic context (gray bars, *** Chi-square test, p <0.0001). (D and H) Venn diagram with overlaps determined by 3 different conservation analyses of intronic (D) and antisense (H) lncRNAs. RNAz predicted secondary structure conservation for 97 and 164, PhastCons predicted DNA conservation for 2,036 and 4,545, and TransMap predicted expression conservation for 512 and 1,988 intronic and antisense lncRNAs, respectively.
We also investigated the potential for intragenic lncRNAs to form stable secondary structures. We found that 97 intronic and 164 antisense lncRNAs had stable secondary structures, as predicted by RNAz,47 accounting for ∼3% of the intragenic lncRNAs detected in human cells. A similar analysis performed in parallel using a set of random sequences, identified 17 and 81 loci with predicted structures in the intronic and antisense controls, respectively. The observed number of structured lncRNAs is significantly greater compared with those expected by chance alone (Chi-square test, p < 0.0001), indicating that at least a fraction of intragenic lncRNAs expressed in human cells (> 1.5%) bear bona fide thermodynamically stable secondary structures. Comparable results were obtained using intronic (6.5%), antisense (4.4%) and intergenic (3.3%) lncRNAs from GENCODE. By combining the 3 analyses described above (Transmap, PhastCon and RNAz) we found 12 intronic and 39 antisense lncRNAs with multiple evidence of evolutionary conservation (Fig. 4D, H).
Intronic and antisense lncRNAs are correlated with protein-coding gene expression in cis and in trans
Various lncRNAs have been shown to act in cis to regulate the expression of nearby protein-coding genes.48-51 We calculated the Pearson correlation coefficient between each lncRNA/host mRNA pair using the expression values measured in the 4 cell lineages under study. The distribution of lncRNA/host mRNA correlation values was significantly different from that measured using a random lncRNA/mRNA data set (Kolmogorov-Smirnov test, p < 0.0001) (Fig. 5), revealing that large proportions of the intronic (200, 16%) and antisense (188, 10%) lncRNAs were positively correlated (r ≥ 0.7) with their host mRNAs and suggesting the existence of co-regulation between them.
Figure 5.

Co-expression between lncRNAs and mRNAs from the same locus. Distribution of Pearson correlation coefficients between the intronic (A) or antisense lncRNAs (B) and mRNAs expressed from the same gene locus across 4 cell lineages (black bars). As a control, correlations were also calculated following the shuffling of lncRNA sequences across gene loci (gray bars). Significant differences in the correlation distributions of lncRNA/mRNA expressed from the same locus and from shuffled loci were determined using the Kolmogorov-Smirnov nonparametric test (p < 0.0001).
A comparison of the intronic lncRNAs that were correlated (r ≥ 0.7) with the mRNA expressed from the same locus with those that were not correlated (r < 0.7) allowed for the discernment of 2 significantly different groups (Mann-Whitney test, p < 0.0001), with the correlated intronic lncRNAs having higher expression levels (median 213 a.u.) than the non-correlated ones (149 a.u.). Additionally, the correlated intronic lncRNAs were significantly less stable (median t1/2 = 1.8 h vs. 2.7 h; Mann-Whitney test, p < 0.02) and a greater proportion of them were uncapped (18% vs. 11%, respectively; Fisher's exact test, p < 0.02) compared with the non-correlated lncRNAs, indicating that these transcripts were enriched in stable intron lariats from highly expressed genes.
We also evaluated the co-expression patterns of lncRNAs and protein-coding mRNAs across the 4 cell lineages. We detected 1,509 intronic and 2,186 antisense lncRNAs that were positively or negatively correlated (r ≥ |0.8|) with at least one protein-coding gene (p < 0.001), suggesting that a subset of intragenic lncRNAs may interact in trans to regulate the expression of genes located at distant loci, as already documented for intergenic lncRNAs.52-54
To facilitate further exploration of this dataset, a compilation of the data pertaining to intragenic lncRNA biogenesis, processing, stability, subcellular localization, cis and trans co-expression with mRNA loci, conservation, secondary structure prediction, association with chromatin marks and cell type-specific expression is provided as supplementary material (Table S3). This resource can be used to filter out transcripts that presumably represent splicing lariats/not independent transcripts based on different criteria and to select putative bona fide intragenic lncRNAs for detailed characterization. As an example, we identified a set of 107 intragenic lncRNAs (40 intronic, 67 antisense) with the following criteria: presence of a 5′ cap, half-life of ≥1 h, positioned within 2 kb of the CAGE and H3K4me3 marks and conserved across vertebrates (Table S4). Among these lncRNAs, 20 are co-expressed and correlated in cis to 20 protein-coding loci, and 99 are co-expressed and correlated in trans with one or multiple mRNAs. This set also included 11 intragenic lncRNAs (5 intronic, 6 antisense) expressed in cell lines in which the mRNA from the corresponding host gene locus was not detected in at least one cell line. Selected examples of intronic and antisense lncRNAs are shown in the genomic context as supplementary material (Figs. S20 and S21).
Discussion
In the present work, we report data obtained from biochemical and computational analyses that provide new molecular, biological and functional insights regarding intragenic long noncoding RNAs transcribed from regions that overlap introns of protein-coding genes. Notably, less than 10 % of the approximately 16,000 intragenic lncRNAs examined in this work match current lncRNA annotations in genomic databases (GENCODE v.21 and NONCODE v.4), highlighting the currently incomplete knowledge of the noncoding transcriptome. This lack of annotation is due to both the notoriously lower abundance of and the greater tissue specificity of lncRNAs relative to protein-coding mRNAs; thus, lncRNAs require more sensitive methods for their detection in different cell types and conditions. In addition, reconstructing bona fide intragenic lncRNA transcripts that overlap with the mRNAs produced from the same loci is difficult.
Classification of lncRNAs is still challenging and can be made based on different criteria, one of them being their association with annotated protein-coding genes.55 Though it is largely assumed 18,56 that RNAs spanning introns represent unstable spliced intron lariats, some stable and functional spliced intron lariats have been demonstrated,57,58 and the findings of the present study suggest that the class known as intronic lncRNAs, i.e., those with the same orientation as the host mRNA, comprise a heterogeneous group of transcripts that can be distinguished according to their stability profiles. In agreement with a previous study,27 we have shown that intronic lncRNAs (median t ½ = 2.1 h) are more unstable than antisense lncRNAs (median t ½ = 3.9 h) and mRNAs (median t ½ = 3.2 h). We have also shown that intronic lncRNAs comprise transcripts with or without a 5′ cap.
In our study we found that most 5′capped antisense (72%, 492 out 688) and many 5′capped intronic (46%, 321 out 703) lncRNAs expressed in HeLa cells are stable (t½ ≥ 3 h) after transcription blocking with actinomycin D. In apparent contradiction, a recent study using the same cell line reported that most lncRNAs emanating from DNase hypersensitive sites are unstable based on their sensitivity to exosome degradation.59 We noted that only 33 antisense (5% of total) and 56 intronic (8% of total) lncRNAs from our set were analyzed in that study. For these, the same fraction of stable intronic lncRNAs transcripts was observed in both studies (39%). For the antisense lncRNAs analyzed in both studies, a 2-fold greater fraction was classified as stable transcripts in our study (67% vs. 33%). We think the small overlap between the 2 data sets and the heterogeneity of the transcript biotypes classified under the “unannotated RNAs” category in their study 59 compound to explain the apparent discrepancy in the stability of antisense lncRNAs between the 2 studies. We also note that the 2 studies used different methods to examine RNA stability (transcriptional blocking via actinomycin D versus blocking of a specific RNA decay pathway via exosome-depletion) that would not necessarily give identical results. Despite these limitations, this analysis suggests that the intragenic lncRNAs investigated in our study share features with the exosome-insensitive lncRNAs reported by Andersson et al..59
The 5′ cap modification is likely to contribute to the stability of intronic lncRNAs, as demonstrated by the finding that the frequencies of 5′-capping of the long-lived (t ½ ≥ 3 h) and short-lived (t ½ < 3 h) intronic lncRNAs were 98% and 82%, respectively (p < 0.0001). Another mechanism of stabilizing intronic lncRNAs has been recently described in a report identifying hundreds of intronic sequences that escape from debranching by the formation of regulatory circular RNA molecules (ciRNAs).18 The lower frequency of 5′-capping of the short-lived intronic lncRNAs shown here suggests they are mainly lariats resulting from pre-mRNA splicing that accumulate in cells.
Although a fraction of intronic and antisense lncRNAs did not have 5′ cap modifications, most of them (85% and 95%, respectively) showed evidence of 5′-capping, indicating that the 5′ cap contributes positively to lncRNA stability. In accordance with our findings, a previous study in yeast has demonstrated that decapping by DCP2 negatively modulates the expression of over 100 lncRNAs.14 The authors have found that the degradation of lncRNAs located proximal to inducible genes is required for these genes to respond to environmental changes related to galactose metabolism. Failure to destabilize GAL10 lncRNA, for example, results in changes in the acetylation statuses of the histones associated with this locus, which represses the induction of the proximal GAL1 mRNA,14 suggesting that decapping regulates lncRNA stability and that lncRNA stability itself might have a regulatory function.
Another factor that impacts RNA stability is subcellular localization. Clark and collaborators 27 have shown in a previous study of mouse cell lines that nuclear-enriched lncRNAs have shorter half-lives compared with those detected in both the nucleus and cytoplasm, which is consistent with possible chromatin-associated regulatory functions of these transcripts. Here, we expanded this analysis by examining a set of intronic and antisense lncRNAs expressed in human cells. We found that a major fraction of the nuclear-enriched intronic lncRNAs (90%) were short-lived (t1/2 < 3 h) and that approximately half of them had a half-life of < 1 h. This finding may reflect the presence of unstable spliced introns and/or of pre-mRNA fragments that accumulated in the cell.
Our analysis shows that most short-lived intronic lncRNAs have 5′cap. However, It does not differentiate between independent transcriptional events where 5′cap modification occurs along with RNA Pol II transcription from those where the 5′cap is added after RNA processing events.60,61 Also it does not distinguish 5′capped unprocessed pre-mRNAs from mature transcripts. The presence of 5′cap needs to be integrated to other biochemical and genomic evidence in order to determine the biogenesis mechanism of any given transcript as being independently transcribed or resulting from processing of a longer RNA. The observation of the significant enrichment of chromatin marks associated with actively transcribed promoters (H3K4me3) and active enhancers (H3K4me1 and H3K27ac) in the vicinity of the putative TSSs of a fraction of the intronic lncRNAs supports the notion of these RNAs being independently transcribed. It is conceivable that these intronic lncRNAs may exert functional roles that require short half-lives, similar to other classes of lncRNAs. For example, lincRNAs that associate with chromatin-binding proteins exhibit lower stability, suggesting that lncRNA “transcription factors” may exist with properties analogous to those of proteins 28,51,62 Likewise, the nuclear-enriched antisense lncRNA ANRASSF1 that interacts with PRC2 and regulates the expression of RASSF1A in cis has a short half-life (∼50 min). 51
LncRNAs may affect the expression of nearby protein-coding mRNAs in cis 10,51,57,63 or of distantly located loci in trans.52,53,64-67 Interestingly, a number of the intronic (686, 17% of the total detected) and antisense (809, 14% of total detected) lncRNAs showed evidence of containing both the H3K4me1 and H3K27ac marks at their putative TSSs, suggesting that they may belong to the previously described class of intragenic enhancer-associated RNAs (eRNAs).68 We found that the expression levels of some intronic and antisense lncRNAs were positively correlated with those of the host genes in cis and that the expression levels of some lncRNAs were correlated positively or negatively with those of certain genes in trans, suggesting co-regulation between coding and noncoding transcripts in specific biological pathways. Interestingly, the mRNAs co-expressed with intragenic lncRNAs in trans were assigned to enriched GO terms associated with basic cellular processes, such as the cell cycle, DNA replication and cytoskeleton organization (see Fig. S22). This observation supports the involvement of intragenic lncRNAs in complex and intricate lncRNA-mRNA regulatory networks within the cell. Additional studies to examine the phenotypic consequences elicited by depletion of specific intragenic lncRNAs are warranted to confirm their functional role as critical regulators of these networks.
Finally, we evaluated the evolutionary conservation of expression and found a set of intronic and antisense lncRNAs with conserved expression across primates and mammals. Although this analysis does not inform about the conservation of gene structures, it provides further evidence of their potential functionality. It should be noted that this analysis was limited by the different degrees of annotation accuracy and of coverage depths of the transcriptomes from the different species. The observed evolutionary conservation of intragenic lncRNAs is in line with the recently reported evolution of lncRNA repertoires and expression patterns in tetrapods.69
In summary, in this work we have generated a detailed catalog with novel information regarding the biogenesis, processing, stability, subcellular localization, cis and trans co-expression with mRNA, and cell type-specific expression of a representative subset of unspliced lncRNAs expressed from intronic regions of the human genome. These data add to current knowledge regarding the biology of lncRNAs and provide a resource that will aid in the experimental characterization of novel regulatory functions.
Materials and Methods
Cell lines and RNA extraction
All cell lines were obtained from the American Type Culture Collection (ATCC) and were grown as recommended in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin (Gibco). Total RNA was extracted using TRIzol (Invitrogen) and purified using an RNAspin Mini Kit (GE Healthcare) according to the manufacturer's instructions, except that an extended 1 h treatment with DNase I was performed. Total RNA was quantified with a NanoDrop ND-1000 Spectrophotometer (NanoDrop Technologies). Total RNA integrity was assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies).
Design of intron-exon 44k oligoarray and microarray hybridization procedures
We used a previously designed Agilent custom intron-exon 44k oligoarray.19 Multiple oligonucleotide probes were designed to interrogate a collection of 11,574 unspliced cDNA contigs mapping to intragenic regions of the human genome and identified following the assembly of GenBank RefSeq mRNAs and ESTs,19 and the Agilent oligoarray probe design recommendations were used, as follows: the 60-mer probes should not have 8 or more bases derived from repetitive regions of the genome or homopolymeric stretches of 7 or more bases (low complexity), and should have a GC content of 35% to 55% and a Tm of 68–76°C. To reduce cross-hybridization, each 60-mer sequence was searched using BLAST against the genome sequence and any probe presenting a second best hit against a distinct genomic location with a bit-score equal to or greater than 42.1 were excluded. The remaining probes were mapped back to their respective intragenic lncRNA targets and one probe was selected closer to the 3′ end of each target. Over the course of this study, the microarray probes were re-mapped to the latest version of the human genome (hg19) and re-annotated to reflect the updated RefSeq and UCSC gene models (Oct. 2012). According to this new annotation, the oligoarray contained 6,322 probes interrogating intronic lncRNAs, 10,482 probes interrogating antisense lncRNAs and 6,360 probes interrogating protein-coding transcripts. The re-annotated array design is deposited in GEO under the accession number GPL19372.
Labeled targets were generated by in vitro transcription using 200 ng total RNA and a T7-poly-dT-based Low Input Quick Amp Labeling Kit (Agilent). Cy3- or Cy5-labeled cRNA targets (see below) were incubated with arrays using a Gene Expression Hybridization Kit (Agilent Technologies), as recommended by the manufacturer. The arrays were washed according to the Agilent SSPE wash protocol v. 2.1 and scanned with a High-Resolution Microarray Scanner (Agilent Technologies). Images were quantified using Agilent Feature Extraction Software (version 9.5).
α-Amanitin assay
For RNA polymerase II inhibition, 5 × 105 HeLa cells were seeded per 10-cm-diameter dish and cultured for 24 h. Subsequently, the medium was replaced with fresh medium containing 50 μg/ml α-amanitin (Sigma) or vehicle (water) as a control, and the cells were treated for 9 h. Subsequently, the cells were washed once with ice-cold phosphate-buffered saline, harvested, and pelleted, and total RNA was extracted. Random hexamer-primed RT-qPCR was performed to measure the specific genes used as positive controls (c-myc, transcribed by RNAP II) and negative controls (18S rRNA and 45S rRNA, transcribed by RNAP I and pre-tRNATyr, and 7SK, transcribed by RNAP III) of transcriptional inhibition by α-amanitin (the primers used are listed in Table S2). Two-color array measurements were performed to compare the samples treated with α-amanitin with those treated with vehicle. Four biological replicates were assessed with dye swapping. The averages of the dye swap ratios were used (control/α-amanitin). The probes with significantly greater expression (as determined via the 2-sided t-test) compared with the corresponding background level (flag ‘IsPosAndSignif’) in at least 3 of the 4 control sample replicates were defined as expressed and further analyzed (13,130 probes were kept). The raw and normalized data are deposited in GEO (GSE62989). Significance analysis of microarrays (SAM) approach 70 was employed to identify transcripts affected by α-amanitin treatment with the following parameters: one-class response, 16 permutations, K-nearest neighbors' imputer, and q-value ≤0.1%.
RNAseq Analysis
The genomic coordinates of sequence contigs resulting from the assembly of strand-specific ENCODE/CSHL long RNAseq data 24 were downloaded from UCSC genome browser (wgEncodeCshlLongRnaSeq track, hg19/GRCh37). Only unspliced contigs with lengths of greater than 200 nucleotides were kept for further analysis (n = 16,804). We used BEDTools software package to compare the genomic coordinates of these contigs with those of our set of intronic and antisense lncRNAs and to calculate the coverage percentage. Analysis was performed using datasets from 22 cell lineages (CD34+ Mobilized, HAoAF, HAoEC, HCH, HFDPC, HMEpC, HMNCCB, hMNC-PB, hMSC-AT, hMSC-BM, hMSC-UC, HOB, HPC-PL, HPIEpC, HSaVEC, HVMF, HWP, IMR90, NHDF, NHEM.f M2, NHEM M2 and SkMC).
5′ Cap analysis
To test for the presence of a 5′ cap, HeLa total RNA (10 µg) was treated with terminator 5′-phosphate-dependent exonuclease (5′-exo, 1 unit; Epicentre Biotechnologies), which is a processive 5′→3′ exonuclease that digests RNA with a free 5′-monophosphate end and cannot digest RNA that has a 5′ cap. Samples were treated with 5′-exo for 2 h at 30°C without (TAP−/5′-exo+) or with tobacco acid pyrophosphatase (TAP) pre-treatment (10 units; Epicentre Biotechnologies) (TAP+/5′-exo+) for 1 h at 37°C to release the 5′ cap, according to the manufacturer's protocol. RT was performed in the presence of an oligo-dT primer plus a specific primer for snRNA U15A, followed by qPCR with primers for the specific genes used as positive controls for 5′ cap removal (c-myc and α-tubulin) or that used as a negative control (snRNA U15A, which does not have a 5′ cap).71 The primers used are listed in Supplemental Table 2. Two-color array hybridizations were performed, in which TAP+/5′-exo+ was compared with the TAP−/5′-exo+ control sample. Four biological replicates were assessed with dye swapping. The averages of the dye swap ratios were used (TAP−/5′-exo+/TAP+/5′-exo+). The probes with significantly greater expression (as determined via the 2-sided t-test) compared with the corresponding background level (flag ‘IsPosAndSignif’) in at least 3 of the 4 TAP-/5′-exo+ sample replicates were defined as expressed and further analyzed (14,262 probes were kept). The filtered data were normalized by the average intensity of 320 control probes (excluding probes with intensity changes of more than 2 standard deviations of the average) with signals from labeled targets generated from Agilent synthetic spiked-in mRNA. The raw and normalized data are deposited in GEO (GSE62984). Significance analysis of microarrays (SAM) approach70 was employed to identify the transcripts affected by the treatment with the 2 enzymes (TAP+/5′-exo+) with the following parameters: one-class response, 16 permutations, K-nearest neighbors imputer, and q-value ≤0.1%.
Actinomycin assay
For actinomycin D treatment, 5 × 105 HeLa cells were seeded per 10-cm-diameter dish and cultured for 24 h in regular medium, which was subsequently replaced with fresh medium containing 10 μg/ml actinomycin D (Invitrogen) or vehicle DMSO (mock). At each time point (0, 1, 3, 6 and 8 h), treated and control cells were harvested for RNA extraction. Oligo-dT-primed RT-qPCR was performed to measure the half-life of c-Myc to verify the efficiency of transcriptional inhibition, and it was determined to be 0.38 h, which is in agreement with published data72 (Fig. S1D; primers in Table S2). For two-color array hybridizations, samples treated with actinomycin D at each time point and labeled with Cy3 were compared with an RNA pool generated from samples treated with a DMSO vehicle and labeled with Cy5 (used as a reference). Two biological and 2 technical replicates were assessed at each time point. Only the probes showing significant expression in the reference sample as determined by the 2-sided t-test (‘IsPosAndSignif’ flag in Feature Extraction output) and a signal that was at least 2-fold greater than the local background measurement in at least 3 of the 4 replicates at each time-point were further analyzed (5,874 probes were kept). The time-course expression data set was normalized by the expression level that was measured prior to treatment (0 h), which was set to 1. We calculated a normalization factor by adjusting the average fold change between each actinomycin D treatment time point and vehicle to zero (log scale) for the 7 more stable transcripts with relative abundances that appeared to increase over time due to the overall decrease in RNA mass.26,27 The intensity values of all transcripts at each time-point were scaled based on this calculated normalization factor. The raw and normalized data are deposited in GEO (GSE62963). Transcript half-lives were calculated by fitting one-phase exponential decay and linear regression models, and the value obtained with the model that produced the larger R2 was kept. Transcripts with an R2 of <0.7 were excluded from further analysis. GraphPad Prism software version 5.04 was used to calculate the half-lives.
Cell fractionation assay
Subcellular fractionation was performed as described by Topisirovic.73 A 2-color design was used, in which nuclear and cytoplasmic fractions were labeled and co-hybridized to the same array. Two biological replicates were assessed with dye swapping. The averages of the dye swap ratios were used. The probes with signals that were significantly greater than the corresponding background signal (2-sided t-test, flag “IsPosAndSignif”) in at least one of 2 replicates of the nuclear or cytoplasmic samples and with concordant expression in the nucleus/cytoplasm replicates (the same direction of increased/decreased expression) were defined as expressed and further analyzed (11,657 probes were kept). Considering that the amount of total RNA present in the cytoplasm exceeded that found in the nucleus,35,74 it is conceivable that the use of equal amounts of RNA from the nuclear- and cytoplasmic-enriched fractions for the array hybridizations could have led to the apparent nuclear enrichment of some transcripts. To attenuate this effect, we estimated the relative amounts of nuclear and cytoplasmic RNA based on 10 fractionation experiments followed by measurements of RNA mass and calculated a correction factor (1: 2.8, nucleus:cytoplasm), which was used to normalize the gene expression data across all replicates. Additionally, differences in cRNA target generation between the nuclear and cytoplasmic fractions were corrected using signals from control probes detected with labeled targets generated from Agilent synthetic spiked-in mRNAs (10 sets of 32 control probes, the average values were used). The raw and normalized data are deposited in GEO (GSE62985). Transcripts with ≥3-fold enrichment in the nucleus or cytoplasm for both replicates and a q-value of <0.1% (SAM approach 70) were considered enriched in this compartment.
Cell type-specific expression
Two independent replicate RNA preparations from Mia PaCa 2, DU-145, MCF-7 and HeLa cells were labeled and hybridized to intron-exon 44k oligoarrays with dye swapping. The probes with significantly greater expression compared with the corresponding background level (as determined by the 2-sided t-test, flag ‘IsPosAndSignif’) in at least 3 of the 4 replicates were defined as expressed and further analyzed. The raw data were normalized by the quantile method so that the expression values between the lineages were comparable.75 To evaluate cell type specificity, we calculated the fractional expression level (FEL) for each transcript and cell line, corresponding to the fraction of the cumulative normalized intensity detected in all cell lines under study.
5-Aza-2-deoxycytidine assay
Demethylation of cytosine residues was achieved by exposing cells (2 × 106 seeded per 15-cm-diameter dish) to culture media containing a methyltransferase inhibitor, 5-aza-2′-deoxycytidine (Aza) (Sigma), at 5 μM for 72 h for DU-145 and 96 h for Mia PaCa 2 and MCF-7. Mock-treated cells (control) were cultured with an equivalent volume of water. RT-PCR was performed to measure the methylation statuses of the promoters of the specific genes used as controls for Aza treatment efficacy (NPTX2, GAGE 2A and GSTP1 in Mia PaCa 2, DU-145 and MCF-7 cells, respectively). The primers used are listed in Supplemental Table 2.43-45 Two-color array measurements were performed to compare the samples treated with 5-Aza with those treated with vehicle. Two biological replicates were assessed with dye swapping. The probes with significantly greater expression (as determined via the 2-sided t-test) compared with the corresponding background level (flag ‘IsPosAndSignif’) in 3 of the 4 control sample replicates were defined as expressed (biological plus technical replicates). The raw and normalized data are deposited in GEO (GSE62959, GSE62955, and GSE62958). Significance analysis of microarrays (SAM) approach 70 was employed to identify transcripts affected by the 5-Aza treatment with the following parameters: one-class response, 16 permutations, K-nearest neighbors' imputer, and q-value ≤10%.
Strand-specific RT-PCR
Orientation-specific RT was performed with 1.5 µg total RNA using a gene-specific primer complementary to each intronic or antisense transcript, according to the recommendations of the Super Script III Kit protocol (Invitrogen). Subsequently, end-point PCR (40 cycles) was performed using an internal primer. The primers used are listed in Supplemental Table 2. To control for the absence of RNA self-annealing and the absence of DNA contamination in the RNA sample, reverse transcription was performed without RT primers, followed by qPCR with the internal primer pair.
Reverse transcription and quantitative real-time PCR (RT-qPCR)
Oligo-dT- or random hexamer-primed reverse transcription (RT) was performed using 1 µg of total RNA according to the Super Script III kit protocol (Invitrogen). The relative transcript levels were determined by performing quantitative real-time PCR (qPCR) (the primers used are shown in Table S2) with Power SYBR Green (Applied Biosystems) according to the delta Ct method,76 using a 7500 Real-Time PCR System (Applied Biosystems). All primer sets used in the RT-qPCR experiments were tested to confirm their amplification efficiency at different template concentrations. The amplification efficiency ranged from 91 to 110 %.
Gene enrichment analysis
We used GO-function package 77 with Benjamini and Yekutieli multiple testing correction 78 and a significance level of p < 0.05. All gene IDs of the protein-coding genes that were present in our 44k oligoarray were included in the reference dataset.
LncRNA/mRNA co-expression analysis
We investigated the co-expression patterns of intronic/antisense lncRNAs and mRNAs, both in cis (lncRNA and mRNA co-expressed from the same locus in a given cell lineage) and in trans (each lncRNA and all mRNAs co-expressed in a cell type). First, we created a list of all of the intronic and antisense lncRNAs that were co-expressed with mRNAs in Mia PaCa 2, DU-145, MCF-7 and HeLa cells. For in cis correlation analysis, we calculated the Pearson correlation coefficient (ρ) using R software environment (www.r-project.org) and 100 random permutated groups to determine the distribution. The Kolmogorov-Smirnov (KS) test was used to compare the continuous probability distributions of the in cis correlation data set with those calculated for each of the 100 random control sets (p < 0.05 as a threshold). We used GraphPad Prism software (GraphPad Software, La Jolla, California, USA) to obtain a histogram of Pearson correlation distribution in cis. For in trans correlation analysis, we constructed a matrix of correlation (using an R script) of 1,655 intronic and 2,467 antisense lncRNAs vs. 4,568 mRNAs expressed in the 4 cell lineages described above. Next, we selected the lncRNAs that were the most highly correlated in trans with a cutoff of −0.8 > ρ > 0.8 (p <0.001) and searched for enriched “Biological Process” and “Molecular Function” Gene Ontology terms among the correlated mRNAs (see above).
In silico association with genomic regulatory elements
We used BEDTools software package 79 to compare the genomic coordinates (hg19 GRCh37) of intronic and antisense lncRNA datasets with those of transcriptional regulatory elements data sets generated by the ENCODE project 1,42 and retrieved from the UCSC Genome Browser: ENCODE/RIKEN CAGE tags from polyA+ RNA-derived libraries from 35 human cell lines (wgEncodeRikenCage tracks) ; ENCODE ChIP-seq of RNA polymerase II binding sites from 10 cell lines (wgEncodeAwgTfbsUniform tracks) ; ENCODE/Broad-MGH ChIP-seq data of H3K27ac (17 cell lines), H3K4me1(16 cell lines) and H3K4me3 (16 cell lines) DNA binding sites (wgEncodeBroadHistone tracks). Tracks with genomic coordinates of CpG islands mapped by epigenome prediction 80 were also obtained from the UCSC Browser. Regulatory elements that mapped up to 1 kb upstream of the TSSs and 5′ UTRs of the RefSeq transcripts were removed to avoid the contributions of signals at the start sites of known genes to the enrichment of regulatory elements at the start sites of lncRNAs mapping nearby. We computed the distances of the closest CAGE tags, CpG islands, RNA Pol II, and H3K27ac, H3K4me1 and H3K4me3 marks to the predicted TSSs for our set of 3,937 expressed intronic, 5,889 expressed antisense lncRNAs, and 5,866 expressed protein-coding mRNAs (associated with 12,162 RefSeq isoforms). The same analysis was performed of lncRNAs annotated in the GENCODE catalog (v.21), comprising 387 monoexonic intronic transcripts, 813 monoexonic antisense transcripts and 2,448 lincRNAs. Regulatory elements located at distances of up to 10 kb from the putative TSSs were considered. To test for the statistical significance of overlap distribution, we created 10 control datasets of randomly selected sequences for each group of sequences (intronic lncRNA, antisense lncRNA, GENCODE monoexonic intronic, GENCODE monoexonic antisense and GENCODE lincRNAs). For each group, the random control sequences displayed the same number, length and genomic context (intronic or intergenic) as the test set. In addition, the control sequences did not match genomic regions with evidence of expression based on the strand-oriented ENCODE/CSHL Long RNAseq contig data set.24 The Kolmogorov-Smirnov (KS) test was used to compare the continuous probability distributions of the abundance of each relevant genomic marker with those calculated for the random sequence sets, using a p < 0.05 threshold.
Conservation analysis
TransMap tracks with cross-species pairwise genome alignments that identify expression conservation across 15 species (orangutan, rhesus, mouse, rat, cow, chicken, dog, horse, rabbit, zebra finch, medaka, zebrafish, lamprey, stickleback and west clawed frog)46 were downloaded from UCSC genome browser (HG19, http://genome.ucsc.edu). Next, we cross-referenced genomic coordinates from the TransMap aligments with our set of 3,937 intronic lncRNAs and 5,889 antisense lncRNAs. Only transcripts displaying at least 10% overlap with TransMap regions were considered further. The same analysis was performed with lncRNAs annotated in the GENCODE catalog (v. 21). Statistical significance was ascertained using random sequence sets with same number, length and genomic context, as described above. The chi-square test was performed to compare lncRNA sets and random groups, using a p < 0.05 as the significance threshold. We also evaluated the expression conservation of homologous lncRNAs based on sequence similarity of ESTs. For this analysis, we randomly chose an equal number of ESTs (n = 36,700) from 15 vertebrate species and searched for similarity between these cDNA sequences and our set of intronic/antisense lncRNAs using BLASTN.81 Only the hits with an e-value of <10−10 and similarity of higher than or equal to 70% were further assessed.
For analysis of conserved DNA elements, we retrieved data for vertebrates (phastCons 46way vertebrates), mammals (phastCons 46way placental) and primates (phastCons 46way primates) from UCSC genome browser (HG19, http://genome.ucsc.edu). We cross-referenced the coordinates from our dataset of intronic/antisense lncRNAs and GENCODE lncRNAs to those from each of these data sets using BEDTools.79 The coverageBed option was used to determine the coverage level of the lncRNA sequences according to the phastCons conserved DNA elements. The number of overlapped conserved elements in each group was normalized by the total number of conserved elements in each dataset. Random sequence sets with same numbers, lengths and genomic context were used as controls to determine statistical significance, as described above. Statistical analysis was conducted using the chi-square test, and only the elements with a p < 0.05 were further assessed.
RNAz package47 was used with the default parameters (length = 120 bp and window = 40 bp) to predict conserved and thermodynamically stable secondary structures in the intronic/antisense lncRNA data sets. We used the multi-alignment file in UCSC with 46 vertebrate species as input. Only predicted structures with a p > 0.5 were considered to contain conserved secondary structures. The chi-square test was performed to compare the sets of intronic and antisense lncRNAs with 10 sets of random sequences with same numbers, lengths and genomic context, as described above. Only p-values of < 0.05 were accepted.
Coding Potential
Coding Potential Calculator (CPC) was used with the default parameters to evaluate the protein-coding potentials of the intronic and antisense lncRNAs.82
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Funding
This work was supported by a grant from the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP). A.C.A., A.C.T., L.C. and F.C.B. received fellowships from FAPESP. E.M.R. and S.V.A. received investigator fellowship awards from CNPq.
Supplemental Material
Supplemental data for this article can be accessed on the publisher's website.
References
- 1.Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M. An integrated encyclopedia of DNA elements in the human genome. Nature 2012; 489:57-74; PMID:22955616; http://dx.doi.org/ 10.1038/nature11247 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Esteller M. Non-coding RNAs in human disease. Nat Rev Genet 2011; 12:861-74; PMID:22094949; http://dx.doi.org/ 10.1038/nrg3074 [DOI] [PubMed] [Google Scholar]
- 3.Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, et al.. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 2012; 22:1775-89; PMID:22955988; http://dx.doi.org/ 10.1101/gr.132159.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 2011; 25:1915-27; PMID:21890647; http://dx.doi.org/ 10.1101/gad.17446611 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Mercer TR, Dinger ME, Sunkin SM, Mehler MF, Mattick JS. Specific expression of long noncoding RNAs in the mouse brain. Proc Natl Acad Sci U S A 2008; 105:716-21; PMID:18184812; http://dx.doi.org/ 10.1073/pnas.0706729105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.van Heesch S, van Iterson M, Jacobi J, Boymans S, Essers PB, de Bruijn E, Hao W, MacInnes AW, Cuppen E, Simonis M. Extensive localization of long noncoding RNAs to the cytosol and mono- and polyribosomal complexes. Genome Biol 2014; 15:R6; PMID:24393600; http://dx.doi.org/ 10.1186/gb-2014-15-1-r6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Beckedorff FC, Amaral MS, Deocesano-Pereira C, Verjovski-Almeida S. Long non-coding RNAs and their implications in cancer epigenetics. Biosci Rep 2013; 33: pii: e00061; PMID:23875687; http://dx.doi.org/ 10.1042/BSR20130054 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kung JT, Colognori D, Lee JT. Long noncoding RNAs: past, present, and future. Genetics 2013; 193:651-69; PMID:23463798; http://dx.doi.org/ 10.1534/genetics.112.146704 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Geisler S, Coller J. RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nat Rev Mol Cell Biol 2013; 14:699-712; PMID:24105322; http://dx.doi.org/ 10.1038/nrm3679 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.DeOcesano-Pereira C, Amaral MS, Parreira KS, Ayupe AC, Jacysyn JF, Amarante-Mendes GP, Reis EM, Verjovski-Almeida S. Long non-coding RNA INXS is a critical mediator of BCL-XS induced apoptosis. Nucleic Acids Res 2014; 42:8343-55; PMID:24992962; http://dx.doi.org/ 10.1093/nar/gku561 [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 11.Gong C, Maquat LE. lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 2011; 470:284-8; PMID:21307942; http://dx.doi.org/ 10.1038/nature09701 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Keene JD. Minireview: global regulation and dynamics of ribonucleic Acid. Endocrinology 2010; 151:1391-7; PMID:20332203; http://dx.doi.org/ 10.1210/en.2009-1250 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tani H, Akimitsu N. Genome-wide technology for determining RNA stability in mammalian cells: historical perspective and recent advantages based on modified nucleotide labeling. RNA Biol 2012; 9:1233-8; PMID:23034600; http://dx.doi.org/ 10.4161/rna.22036 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Geisler S, Lojek L, Khalil AM, Baker KE, Coller J. Decapping of long noncoding RNAs regulates inducible genes. Mol Cell 2012; 45:279-91; PMID:22226051; http://dx.doi.org/ 10.1016/j.molcel.2011.11.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wilusz JE, Freier SM, Spector DL. 3′ end processing of a long nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell 2008; 135:919-32; PMID:19041754; http://dx.doi.org/ 10.1016/j.cell.2008.10.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sunwoo H, Dinger ME, Wilusz JE, Amaral PP, Mattick JS, Spector DL. MEN epsilon/beta nuclear-retained non-coding RNAs are up-regulated upon muscle differentiation and are essential components of paraspeckles. Genome Res 2009; 19:347-59; PMID:19106332; http://dx.doi.org/ 10.1101/gr.087775.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yang L, Duff MO, Graveley BR, Carmichael GG, Chen LL. Genomewide characterization of non-polyadenylated RNAs. Genome Biol 2011; 12:R16; PMID:21324177; http://dx.doi.org/ 10.1186/gb-2011-12-2-r16 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhang Y, Zhang XO, Chen T, Xiang JF, Yin QF, Xing YH, Zhu S, Yang L, Chen LL. Circular Intronic Long Noncoding RNAs. Mol Cell 2013; 51:792-806; PMID:24035497; http://dx.doi.org/ 10.1016/j.molcel.2013.08.017 [DOI] [PubMed] [Google Scholar]
- 19.Nakaya HI, Amaral PP, Louro R, Lopes A, Fachel AA, Moreira YB, El-Jundi TA, da Silva AM, Reis EM, Verjovski-Almeida S. Genome mapping and expression analyses of human intronic noncoding RNAs reveal tissue-specific patterns and enrichment in genes related to regulation of transcription. Genome Biol 2007; 8:R43; PMID:17386095; http://dx.doi.org/ 10.1186/gb-2007-8-3-r43 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Rearick D, Prakash A, McSweeny A, Shepard SS, Fedorova L, Fedorov A. Critical association of ncRNA with introns. Nucleic Acids Res 2011; 39:2357-66; PMID:21071396; http://dx.doi.org/ 10.1093/nar/gkq1080 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.St Laurent G, Shtokalo D, Tackett MR, Yang Z, Eremina T, Wahlestedt C, Urcuqui-Inchima S, Seilheimer B, McCaffrey TA, Kapranov P. Intronic RNAs constitute the major fraction of the non-coding RNA in mammalian cells. BMC Genomics 2012; 13:504; PMID:23006825; http://dx.doi.org/ 10.1186/1471-2164-13-504 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Louro R, Smirnova AS, Verjovski-Almeida S. Long intronic noncoding RNA transcription: expression noise or expression choice? Genomics 2009; 93:291-8; PMID:19071207; http://dx.doi.org/ 10.1016/j.ygeno.2008.11.009 [DOI] [PubMed] [Google Scholar]
- 23.Xie C, Yuan J, Li H, Li M, Zhao G, Bu D, Zhu W, Wu W, Chen R, Zhao Y. NONCODEv4: exploring the world of long non-coding RNA genes. Nucleic Acids Res 2014; 42:D98-103; PMID:24285305; http://dx.doi.org/ 10.1093/nar/gkt1222 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, et al.. Landscape of transcription in human cells. Nature 2012; 489:101-8; PMID:22955620; http://dx.doi.org/ 10.1038/nature11233 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Gu M, Lima CD. Processing the message: structural insights into capping and decapping mRNA. Curr Opin Struct Biol 2005; 15:99-106; PMID:15718140; http://dx.doi.org/ 10.1016/j.sbi.2005.01.009 [DOI] [PubMed] [Google Scholar]
- 26.Sharova LV, Sharov AA, Nedorezov T, Piao Y, Shaik N, Ko MS. Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of pluripotent and differentiating mouse embryonic stem cells. DNA Res 2009; 16:45-58; PMID:19001483 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Clark MB, Johnston RL, Inostroza-Ponta M, Fox AH, Fortini E, Moscato P, Dinger ME, Mattick JS. Genome-wide analysis of long noncoding RNA stability. Genome Res 2012; 22:885-98; PMID:22406755; http://dx.doi.org/ 10.1101/gr.131037.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tani H, Mizutani R, Salam KA, Tano K, Ijiri K, Wakamatsu A, Isogai T, Suzuki Y, Akimitsu N. Genome-wide determination of RNA stability reveals hundreds of short-lived noncoding transcripts in mammals. Genome Res 2012; 22:947-56; PMID:22369889; http://dx.doi.org/ 10.1101/gr.130559.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yang E, van Nimwegen E, Zavolan M, Rajewsky N, Schroeder M, Magnasco M, Darnell JE Jr. Decay rates of human mRNAs: correlation with functional characteristics and sequence attributes. Genome Res 2003; 13:1863-72; PMID:12902380; http://dx.doi.org/ 10.1101/gr.997703 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lam LT, Pickeral OK, Peng AC, Rosenwald A, Hurt EM, Giltnane JM, Averett LM, Zhao H, Davis RE, Sathyamoorthy M, et al.. Genomic-scale measurement of mRNA turnover and the mechanisms of action of the anti-cancer drug flavopiridol. Genome Biol 2001; 2:RESEARCH0041; PMID:11597333; http://dx.doi.org/ 10.1186/gb-2001-2-10-research0041 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Faghihi MA, Modarresi F, Khalil AM, Wood DE, Sahagan BG, Morgan TE, Finch CE, St Laurent G 3rd, Kenny PJ, Wahlestedt C. Expression of a noncoding RNA is elevated in Alzheimer's disease and drives rapid feed-forward regulation of beta-secretase. Nat Med 2008; 14:723-30; PMID:18587408; http://dx.doi.org/ 10.1038/nm1784 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Sharma S, Findlay GM, Bandukwala HS, Oberdoerffer S, Baust B, Li Z, Schmidt V, Hogan PG, Sacks DB, Rao A. Dephosphorylation of the nuclear factor of activated T cells (NFAT) transcription factor is regulated by an RNA-protein scaffold complex. Proc Natl Acad Sci U S A 2011; 108:11381-6; PMID:21709260; http://dx.doi.org/ 10.1073/pnas.1019711108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Yoon JH, Abdelmohsen K, Srikantan S, Yang X, Martindale JL, De S, Huarte M, Zhan M, Becker KG, Gorospe M. LincRNA-p21 suppresses target mRNA translation. Mol Cell 2012; 47:648-55; PMID:22841487; http://dx.doi.org/ 10.1016/j.molcel.2012.06.027 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Guttman M, Russell P, Ingolia NT, Weissman JS, Lander ES. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell 2013; 154:240-51; PMID:23810193; http://dx.doi.org/ 10.1016/j.cell.2013.06.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Barthelson RA, Lambert GM, Vanier C, Lynch RM, Galbraith DW. Comparison of the contributions of the nuclear and cytoplasmic compartments to global gene expression in human cells. BMC Genomics 2007; 8:340; PMID:17894886; http://dx.doi.org/ 10.1186/1471-2164-8-340 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Solnestam BW, Stranneheim H, Hallman J, Kaller M, Lundberg E, Lundeberg J, Akan P. Comparison of total and cytoplasmic mRNA reveals global regulation by nuclear retention and miRNAs. BMC Genomics 2012; 13:574; PMID:23110385; http://dx.doi.org/ 10.1186/1471-2164-13-574 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Prasanth KV, Prasanth SG, Xuan Z, Hearn S, Freier SM, Bennett CF, Zhang MQ, Spector DL. Regulating gene expression through RNA nuclear retention. Cell 2005; 123:249-63; PMID:16239143; http://dx.doi.org/ 10.1016/j.cell.2005.08.033 [DOI] [PubMed] [Google Scholar]
- 38.Ravasi T, Suzuki H, Pang KC, Katayama S, Furuno M, Okunishi R, Fukuda S, Ru K, Frith MC, Gongora MM, et al.. Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. Genome Res 2006; 16:11-9; PMID:16344565; http://dx.doi.org/ 10.1101/gr.4200206 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK, Munson G, Young G, Lucas AB, Ach R, Bruhn L, et al.. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 2011; 477:295-300; PMID:21874018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Reis EM, Nakaya HI, Louro R, Canavez FC, Flatschart AV, Almeida GT, Egidio CM, Paquola AC, Machado AA, Festa F, et al.. Antisense intronic non-coding RNA levels correlate to the degree of tumor differentiation in prostate cancer. Oncogene 2004; 23:6684-92; PMID:15221013; http://dx.doi.org/ 10.1038/sj.onc.1207880 [DOI] [PubMed] [Google Scholar]
- 41.Tahira AC, Kubrusly MS, Faria MF, Dazzani B, Fonseca RS, Maracaja-Coutinho V, Verjovski-Almeida S, Machado MC, Reis EM. Long noncoding intronic RNAs are differentially expressed in primary and metastatic pancreatic cancer. Molecular cancer 2011; 10:141; PMID:22078386; http://dx.doi.org/ 10.1186/1476-4598-10-141 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.The_ENCODE_Project_Consortium . A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 2011; 9:e1001046; PMID:21526222; http://dx.doi.org/ 10.1371/journal.pbio.1001046 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.De Backer O, Arden KC, Boretti M, Vantomme V, De Smet C, Czekay S, Viars CS, De Plaen E, Brasseur F, Chomez P, et al.. Characterization of the GAGE genes that are expressed in various human cancers and in normal testis. Cancer Res 1999; 59:3157-65; PMID:10397259 [PubMed] [Google Scholar]
- 44.Jhaveri MS, Morrow CS. Methylation-mediated regulation of the glutathione S-transferase P1 gene in human breast cancer cells. Gene 1998; 210:1-7; PMID:9524203; http://dx.doi.org/ 10.1016/S0378-1119(98)00021-3 [DOI] [PubMed] [Google Scholar]
- 45.Sato N, Fukushima N, Maitra A, Matsubayashi H, Yeo CJ, Cameron JL, Hruban RH, Goggins M. Discovery of novel targets for aberrant methylation in pancreatic carcinoma using high-throughput microarrays. Cancer Res 2003; 63:3735-42; PMID:12839967 [PubMed] [Google Scholar]
- 46.Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput Biol 2007; 3:e247; PMID:18085818; http://dx.doi.org/ 10.1371/journal.pcbi.0030247 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Gruber AR, Neubock R, Hofacker IL, Washietl S. The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures. Nucleic Acids Res 2007; 35:W335-8; PMID:17452347; http://dx.doi.org/ 10.1093/nar/gkm222 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Yu W, Gius D, Onyango P, Muldoon-Jacobs K, Karp J, Feinberg AP, Cui H. Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA. Nature 2008; 451:202-6; PMID:18185590; http://dx.doi.org/ 10.1038/nature06468 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Orom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G, Lai F, Zytnicki M, Notredame C, Huang Q, et al.. Long noncoding RNAs with enhancer-like function in human cells. Cell 2010; 143:46-58; PMID:20887892; http://dx.doi.org/ 10.1016/j.cell.2010.09.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ponjavic J, Oliver PL, Lunter G, Ponting CP. Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the developing brain. PLoS Genet 2009; 5:e1000617; PMID:19696892; http://dx.doi.org/ 10.1371/journal.pgen.1000617 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Beckedorff FC, Ayupe AC, Crocci-Souza R, Amaral MS, Nakaya HI, Soltys DT, Menck CF, Reis EM, Verjovski-Almeida S. The intronic long noncoding RNA ANRASSF1 recruits PRC2 to the RASSF1A promoter, reducing the expression of RASSF1A and increasing cell proliferation. PLoS Genet 2013; 9:e1003705; PMID:23990798; http://dx.doi.org/ 10.1371/journal.pgen.1003705 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, Tsai MC, Hung T, Argani P, Rinn JL, et al.. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 2010; 464:1071-6; PMID:20393566; http://dx.doi.org/ 10.1038/nature08975 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzelmann-Broz D, Khalil AM, Zuk O, Amit I, Rabani M, et al.. A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 2010; 142:409-19; PMID:20673990; http://dx.doi.org/ 10.1016/j.cell.2010.06.040 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Hung T, Wang Y, Lin MF, Koegel AK, Kotake Y, Grant GD, Horlings HM, Shah N, Umbricht C, Wang P, et al.. Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters. Nat Genet 2011; 43:621-9; PMID:21642992; http://dx.doi.org/ 10.1038/ng.848 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.St Laurent G, Wahlestedt C, Kapranov P. The Landscape of long noncoding RNA classification. Trends Genet 2015; 31:239-51; PMID:25869999; http://dx.doi.org/ 10.1016/j.tig.2015.03.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Clement JQ, Maiti S, Wilkinson MF. Localization and stability of introns spliced from the Pem homeobox gene. J Biol Chem 2001; 276:16919-30; PMID:11278282; http://dx.doi.org/ 10.1074/jbc.M005104200 [DOI] [PubMed] [Google Scholar]
- 57.Guil S, Soler M, Portela A, Carrere J, Fonalleras E, Gomez A, Villanueva A, Esteller M. Intronic RNAs mediate EZH2 regulation of epigenetic targets. Nat Struct Mol Biol 2012; 19:664-70; PMID:22659877; http://dx.doi.org/ 10.1038/nsmb.2315 [DOI] [PubMed] [Google Scholar]
- 58.Yin QF, Yang L, Zhang Y, Xiang JF, Wu YW, Carmichael GG, Chen LL. Long noncoding RNAs with snoRNA ends. Mol Cell 2012; 48:219-30; PMID:22959273; http://dx.doi.org/ 10.1016/j.molcel.2012.07.033 [DOI] [PubMed] [Google Scholar]
- 59.Andersson R, Refsing Andersen P, Valen E, Core LJ, Bornholdt J, Boyd M, Heick Jensen T, Sandelin A. Nuclear stability and transcriptional directionality separate functionally distinct RNA species. Nat Commun 2014; 5:5336; PMID:25387874; http://dx.doi.org/ 10.1038/ncomms6336 [DOI] [PubMed] [Google Scholar]
- 60.Mercer TR, Dinger ME, Bracken CP, Kolle G, Szubert JM, Korbie DJ, Askarian-Amiri ME, Gardiner BB, Goodall GJ, Grimmond SM, et al.. Regulated post-transcriptional RNA cleavage diversifies the eukaryotic transcriptome. Genome Res 2010; 20:1639-50; PMID:21045082; http://dx.doi.org/ 10.1101/gr.112128.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.The_ENCODE_Project_Consortium . Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature 2009; 457:1028-32; PMID:19169241; http://dx.doi.org/ 10.1038/nature07759 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Seidl CI, Stricker SH, Barlow DP. The imprinted Air ncRNA is an atypical RNAPII transcript that evades splicing and escapes nuclear export. EMBO J 2006; 25:3565-75; PMID:16874305; http://dx.doi.org/ 10.1038/sj.emboj.7601245 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Heo JB, Sung S. Vernalization-mediated epigenetic silencing by a long intronic noncoding RNA. Science 2011; 331:76-9; PMID:21127216; http://dx.doi.org/ 10.1126/science.1197349 [DOI] [PubMed] [Google Scholar]
- 64.Takayama K, Horie-Inoue K, Katayama S, Suzuki T, Tsutsumi S, Ikeda K, Urano T, Fujimura T, Takagi K, Takahashi S, et al.. Androgen-responsive long noncoding RNA CTBP1-AS promotes prostate cancer. EMBO J 2013; 32:1665-80; PMID:23644382 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Fachel AA, Tahira AC, Vilella-Arias SA, Maracaja-Coutinho V, Gimba ER, Vignal GM, Campos FS, Reis EM, Verjovski-Almeida S. Expression analysis and in silico characterization of intronic long noncoding RNAs in renal cell carcinoma: emerging functional associations. Molecular cancer 2013; 12:140; PMID:24238219; http://dx.doi.org/ 10.1186/1476-4598-12-140 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Ng SY, Johnson R, Stanton LW. Human long non-coding RNAs promote pluripotency and neuronal differentiation by association with chromatin modifiers and transcription factors. EMBO J 2012; 31:522-33; PMID:22193719; http://dx.doi.org/ 10.1038/emboj.2011.459 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 2009; 458:223-7; PMID:19182780; http://dx.doi.org/ 10.1038/nature07672 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Kowalczyk MS, Hughes JR, Garrick D, Lynch MD, Sharpe JA, Sloane-Stanley JA, McGowan SJ, De Gobbi M, Hosseini M, Vernimmen D, et al.. Intragenic enhancers act as alternative promoters. Mol Cell 2012; 45:447-58; PMID:22264824; http://dx.doi.org/ 10.1016/j.molcel.2011.12.021 [DOI] [PubMed] [Google Scholar]
- 69.Necsulea A, Soumillon M, Warnefors M, Liechti A, Daish T, Zeller U, Baker JC, Grützner F, Kaessmann H. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 2014; 505:635-40; PMID:24463510; http://dx.doi.org/ 10.1038/nature12943 [DOI] [PubMed] [Google Scholar]
- 70.Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 2001; 98:5116-21; PMID:11309499; http://dx.doi.org/ 10.1073/pnas.091062498 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Tycowski KT, Shu MD, Steitz JA. A small nucleolar RNA is processed from an intron of the human gene encoding ribosomal protein S3. Genes Dev 1993; 7:1176-90; PMID:8319909; http://dx.doi.org/ 10.1101/gad.7.7a.1176 [DOI] [PubMed] [Google Scholar]
- 72.Dani C, Blanchard JM, Piechaczyk M, El Sabouty S, Marty L, Jeanteur P. Extreme instability of myc mRNA in normal and transformed human cells. Proc Natl Acad Sci U S A 1984; 81:7046-50; PMID:6594679; http://dx.doi.org/ 10.1073/pnas.81.22.7046 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Topisirovic I, Culjkovic B, Cohen N, Perez JM, Skrabanek L, Borden KL. The proline-rich homeodomain protein, PRH, is a tissue-specific inhibitor of eIF4E-dependent cyclin D1 mRNA transport and growth. EMBO J 2003; 22:689-703; PMID:12554669; http://dx.doi.org/ 10.1093/emboj/cdg069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Meister G, Landthaler M, Patkaniowska A, Dorsett Y, Teng G, Tuschl T. Human Argonaute2 mediates RNA cleavage targeted by miRNAs and siRNAs. Mol Cell 2004; 15:185-97; PMID:15260970; http://dx.doi.org/ 10.1016/j.molcel.2004.07.007 [DOI] [PubMed] [Google Scholar]
- 75.Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003; 19:185-93; PMID:12538238; http://dx.doi.org/ 10.1093/bioinformatics/19.2.185 [DOI] [PubMed] [Google Scholar]
- 76.Pfaffl MW. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res 2001; 29:e45; PMID:11328886; http://dx.doi.org/ 10.1093/nar/29.9.e45 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Wang J, Zhou X, Zhu J, Gu Y, Zhao W, Zou J, Guo Z. GO-function: deriving biologically relevant functions from statistically significant functions. Brief Bioinform 2012; 13:216-27; PMID:21705405; http://dx.doi.org/ 10.1093/bib/bbr041 [DOI] [PubMed] [Google Scholar]
- 78.Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat 2001; 29:1165-88; http://dx.doi.org/ 10.1214/aos/1013699998 [DOI] [Google Scholar]
- 79.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010; 26:841-2; PMID:20110278; http://dx.doi.org/ 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Bock C, Walter J, Paulsen M, Lengauer T. CpG island mapping by epigenome prediction. PLoS Comput Biol 2007; 3:e110; PMID:17559301; http://dx.doi.org/ 10.1371/journal.pcbi.0030110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of molecular biology 1990; 215:403-10; PMID:2231712; http://dx.doi.org/ 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
- 82.Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 2007; 35:W345-9; PMID:17631615; http://dx.doi.org/ 10.1093/nar/gkm391 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


