Abstract
Although not canonically polyadenylated, the long noncoding RNA MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) is stabilized by a highly conserved 76-nt triple helix structure on its 3′ end. The entire MALAT1 transcript is over 8000 nt long in humans. The strongest structural conservation signal in MALAT1 (as measured by covariation of base pairs) is in the triple helix structure. Primary sequence analysis of covariation alone does not reveal the degree of structural conservation of the entire full-length transcript, however. Furthermore, RNA structure is often context dependent; RNA binding proteins that are differentially expressed in different cell types may alter structure. We investigate here the in-cell and cell-free structures of the full-length human and green monkey (Chlorocebus sabaeus) MALAT1 transcripts in multiple tissue-derived cell lines using SHAPE chemical probing. Our data reveal levels of uniform structural conservation in different cell lines, in cells and cell-free, and even between species, despite significant differences in primary sequence. The uniformity of the structural conservation across the entire transcript suggests that, despite seeing covariation signals only in the triple helix junction of the lncRNA, the rest of the transcript's structure is remarkably conserved, at least in primates and across multiple cell types and conditions.
Keywords: green monkey, MALAT1, primates, RNA structure, SHAPE
INTRODUCTION
MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) is an ∼8 kb long noncoding RNA (lncRNA) (Ji et al. 2003). It is one of the few highly expressed and highly conserved long noncoding RNAs with multiple functions in both tumors and normal cells (Hutchinson et al. 2007; Ulitsky and Bartel 2013; Arun et al. 2020). MALAT1 is conserved both at the sequence and RNA secondary structure level (Zhang et al. 2017; Tavares et al. 2019; Rivas 2021). MALAT1 has a known and conserved triple helical tertiary structure on its 3′ end (Wilusz et al. 2008, 2012; Brown et al. 2012, 2014; Zhang et al. 2017). Additionally, covariation analysis has supported other specific secondary structures near the triple helical region (Zhang et al. 2017; Rivas 2021). As a result, a careful in-cell RNA structural characterization of full-length MALAT1 has the potential to reveal previously unknown RNA structures (McCown et al. 2019). In addition, it can reveal the role of the cellular environment in modulating the underlying structure.
The size of MALAT1 (∼8 kb) makes it quite similar in length to many human messenger RNAs (Waldern et al. 2021). It is transcribed by RNA polymerase II, but, unlike mRNAs in the cell, MALAT1 is largely retained in the nucleus and is not actively translated by the ribosome (Hutchinson et al. 2007; Tripathi et al. 2010; Djebali et al. 2012; Ingolia et al. 2014). MALAT1 undergoes noncanonical processing; RNase P cleaves the MALAT1 3′ end to produce a tRNA-like small molecule called mascRNA and the mature MALAT1 transcript. MALAT1 mature sequence is not polyadenylated but rather has a genetically encoded A-rich end that is critical in the formation of the triple helix (Brown et al. 2012, 2014; Wilusz et al. 2012). The cleaved MALAT1 3′ end folds back onto itself, creating the triple-helical architecture resistant to exonuclease-mediated decay (Brown et al. 2012, 2014; Wilusz et al. 2012). Thus, mature MALAT1 lncRNA is stable and is highly expressed in a broad range of cell types (Hutchinson et al. 2007; Zhang et al. 2012).
MALAT1 has many proposed functions in the cell ranging from pre-mRNA regulation of splicing to transcriptional regulation, including interacting with transcriptional repressive complex PRC2, interacting with transcription factors, and acting as a competing endogenous RNA (ceRNA) (recently reviewed by Arun et al. 2020). It is generally overexpressed in tumors; however, a Malat1 deletion resulted in a reduction in lung metastases in one study (Gutschner et al. 2013), while another study with a Malat1 knockout promoted metastasis (Kim et al. 2018). Additionally, murine knockout studies of Malat1 have shown that it is nonessential for life (Nakagawa et al. 2012; Zhang et al. 2012). Although MALAT1 has been extensively studied, its endogenous and oncogenic function has yet to be fully elucidated.
Chemical probing techniques allow us to probe RNA structure with single nucleotide resolution in a wide variety of conditions (Deigan et al. 2009; Hajdin et al. 2013; Woods and Laederach 2017; Lackey et al. 2018; Smola and Weeks 2018; Spasic et al. 2018). Furthermore, with the throughput of next generation sequencing, it is possible to obtain high-resolution quantitative data even on large transcripts (Kutchko et al. 2018; Lackey et al. 2018; Mustoe et al. 2018; Smola and Weeks 2018). It is important to distinguish between transcriptome-wide studies, where low to medium coverage provides semiquantitative results on all transcripts, and high precision targeted amplification strategies that offer quantitative chemical probing reactivities across entire transcripts (Weeks 2015; Smola et al. 2016; Busan and Weeks 2017; Woods et al. 2017). In this study, we obtain high-resolution quantitative SHAPE-MaP (selective 2′ hydroxyl acylation by primer extension with mutational profiling) on the MALAT1 lncRNA to investigate subtle changes in RNA structure across three cell lines in two species (Siegfried et al. 2014; Mustoe et al. 2018; Smola and Weeks 2018). This differs from computational studies that have used chemical and enzymatic probing data from different labs to study the structure of MALAT1 (Wang et al. 2021). In particular, recent computational studies of MALAT1 secondary structure have suggested the lncRNA is highly sensitive to the cellular environment, RNA binding proteins, microRNAs and even U1 snRNA (McCown et al. 2019; Wang et al. 2021). However, these studies have relied on data collected in different labs, using different chemical probes. A caveat in such analyses is that the data are different simply because it was collected using different probes and protocols. By collecting highly reproducible chemical probing data in the same laboratory but across different cell lines and species, we can determine how much of the differences in predicted structures are due to variability across techniques and how they are interpreted by RNA folding algorithms versus actual biological differences.
Using these data, we analyze how the cellular environment (in-cell vs. cell-free) affects MALAT1 lncRNA structure. We also investigate how evolutionary substitution in primates affects these structures. Our findings rely on the very high level of quantitative reproducibility of targeted SHAPE-MaP data to reveal a robust set of structures in the MALAT1 lncRNA throughout the entire transcript. We investigate these structures to determine how well they persist both in-cell and cell-free and with respect to evolutionary drift. Together, our work demonstrates a consistency in RNA structure in the MALAT1 lncRNA, even in regions that are not known to be essential for function. Our study therefore suggests that specific structures in very long noncoding RNAs like MALAT1 may be unaffected by changes in sequence between humans and African green monkeys.
RESULTS
In-cell primary structure of MALAT1 transcript
We begin our structural investigation of the MALAT1 lncRNA locus in the human genome by reporting its annotated genome coordinates and overall locus conservation in Figure 1A. As is currently reported in GRCh38 (GenBank Accession GCA_000001405.28), MALAT1 is located on chromosome 11. To display MALAT1's conservation, we are using PhyloP data generated from comparing the genomes of 30 mammals in this case, which can show both well-conserved (as indicated by positive PhyloP scores) and accelerating regions (negative scores) (Pollard et al. 2010). MALAT1 has three annotated transcript isoforms (NR_002819.4, NR_144567.1, and NR_144568.1). The full-length isoform (NR 002819.4) is unspliced, while NR_144567.1 is composed of two exons, and NR_144568.1 is composed of three exons. We performed RNA-seq experiments on Hek293 (ATCC CRL 1573) and A549 (ATCC CCL 185) cell lines to identify the regions of the MALAT1 transcript with the highest expression and which are readily reverse transcribed during library preparation (Supplemental Fig. 1A,B). We observed uniform coverage for NR_144567.1; however, only a small proportion of reads mapped to regions corresponding to the first exon. Thus, we concluded NR_144567.1 exon 2 is the dominantly expressed isoform in the cell lines we will investigate here. Our observation of expression <100 nucleotides downstream from NR_144567.1 exon 2, is consistent with what has been previously reported (Wilusz et al. 2008; Gutschner et al. 2013). We also observed no coverage on the 3′ end of the transcript in the triple-helix and mascRNA regions (Wilusz et al. 2008, 2012; Brown et al. 2012, 2014). This is likely due to the high level of structure in these regions and the inability of the reverse transcriptase to transcribe these regions (Gutschner et al. 2013; Sun et al. 2017; Xiping et al. 2018). We conclude that the primary structure of the lncRNA MALAT1 is the same in the systems we will study here.
FIGURE 1.
Human MALAT1 in cell structure, SHAPE-MaP reproducibility, and structured regions. (A) MALAT1 genomic locus, phlyoP conservation and structure for single exon NR_002819 and alternatively spliced NR_1445671 transcripts (GRCh38). (B) Observed MALAT1 transcript in Hek293 cells by RNA-sequencing shows only exon 2 of NR_1445671 is expressed, with a uniform expression profile. We designed 20 tiled and overlapping primers for amplicon amplification of SHAPE-MaP probing read through reverse transcription (labeled A–T) to cover the expressed area. (C) SHAPE-MaP data is highly reproducible in cell across replicates. Replicate 1 is plotted in black, replicate 2 in gray, and standard error as error bars. (D) Scatter plot showing linear relationship between in-cell replicates and high level of quantitative correlation (R2 = 0.72). Coloring is based on SHAPE reactivity convention, with low shape black, medium yellow, and high red. (E) Median windowed SHAPE reactivity for entire MALAT1 lncRNA used to identify regions of low SHAPE which fold to unique, well-defined structures, also shows almost complete coverage for entire lncRNA. (F) Shannon entropy of the partition function computed using SHAPE as a pseudo-free-energy term in nearest neighbor modeling of RNA structure (Deigan et al. 2009). Regions of low median SHAPE and low Shannon entropy adopt well-defined structures as can be seen in (G) base-pairing probability arc plots for the entire MALAT1 RNA structure folded using SHAPE in cell data. Select regions of low SHAPE and low Shannon entropy are enlarged and the corresponding minimum free-energy structures are shown in H. The minimum free-energy structure nucleotides are colored according to the low, medium, and high reactivity definitions described above.
In-cell secondary structure probing and modeling
We designed a series of 20 overlapping primer pairs for selective amplicon amplification of the expressed region of the MALAT1 transcript using selective 2′ hydroxyl acylation by primer extension with mutational profiling (SHAPE-MaP) (Siegfried et al. 2014; Smola et al. 2015; Lackey et al. 2018; Smola and Weeks 2018). These are labeled A–T in Figure 1B, and illustrate our overlapping strategy, with each primer pair amplifying 500–600 nt of the transcript, which corresponds to the maximum read length of an Illumina sequencer. We used the SHAPE reagent 5-nitroisatoic anhydride (5NIA) to probe the cellular RNA before TRIzol extraction and refolding, as outlined in Materials and Methods (Kutchko et al. 2015; Smola et al. 2015, 2016; Corley et al. 2017; Kutchko and Laederach 2017; Busan and Weeks 2018). In Figure 1C we show representative regions of our SHAPE data for two independent replicates of our in-cell data collected in A549 cells. As is summarized in the scatter plot in Figure 1D, these two replicates are highly correlated and the data are quantitatively reproducible (R2 = 0.78, slope 1.02) for low (black), medium (yellow), and high (red) SHAPE reactivities. This high reproducibility of our data is important as the subsequent analyses will include quantitative comparisons of these data across different conditions.
Figure 1E illustrates the median windowed SHAPE across the entire MALAT1 transcript for the combined average of both replicates, which will be used for comparisons going forward (Rice et al. 2014). It also illustrates that our primer amplicon strategy was successful at capturing high quality SHAPE data for most nucleotides spanning the MALAT1 transcript in cells. From these data we also compute the Shannon entropy (Fig. 1F) and identify regions of low SHAPE, low Shannon entropy indicated as vertical gray shading (Kennedy et al. 2008; Hamada et al. 2009; Busan and Weeks 2017). These regions are more likely to fold into single, well-defined RNA structures and be amenable to minimum free-energy structural analysis (Kutchko et al. 2018; Lackey et al. 2018; Dadonaite et al. 2019). In Figure 1G we show SHAPE-derived base-pairing probability arc plots for the entire transcript. The arcs observed on the SHAPE-derived models of base-pairing probability indicate high support for specific base pairs (Deigan et al. 2009; Hajdin et al. 2013; Spasic et al. 2018) and in Figure 1H we show the minimum free-energy models with SHAPE data derived from the experimental reactivities (Mathews 2006; Tyagi and Mathews 2007; Deigan et al. 2009) for three representative low SHAPE, low Shannon entropy regions. Together these results show that our tiling amplicon strategy produces quantitatively reproducible in cell structure profiling data for the observed region of the MALAT1 transcript and suggests we can use this approach to further investigate the effects of cellular environment and ultimately evolutionary substitutions on the MALAT1 structure.
Cell-free structural probing MALAT1 lncRNA
In Figure 2A we focus on exon2 of the MALAT1 transcript (NR_144567.1) which we observed in our RNA-seq experiments. In Figure 2B we show that cell-free SHAPE data is highly reproducible across replicates (R2 0.88, slope 1.01). Here again, we used 5NIA to probe the cellular RNA, but perform the modification after TRIzol extraction and refolding as outlined in Materials and Methods (Kutchko et al. 2015; Weeks 2015; Corley et al. 2017; Smola and Weeks 2018). We compare cell-free data replicates obtained from RNA extracted from Hek293 and A549 cells. These results show that our cell-free refolding protocol is effective regardless of which cell type the RNA is extracted from, and that the MALAT1 lncRNA in the absence of proteins folds to a very similar (identical within experimental error) conformation regardless of cell of origin. From these data we can compute the averaged median SHAPE (Fig. 2C) and Shannon entropy (Fig. 2D). Most of the low SHAPE, low Shannon entropy regions identified in-cell overlap with those identified here are cell-free.
FIGURE 2.
In versus cell-free MALAT1 RNA structure reveals global changes in reactivity with minimum effect on highly structured regions. (A) Schematic of expressed MALAT1 transcript in human (GRCh38) genome coordinates we observed in both A549 and Hek293 cells. (B) Cell-free SHAPE-MaP data is highly reproducible across replicates and independent of cell line origin. (C) Median windowed (50 nt) SHAPE reactivity for cell-free data combined with (D) SHAPE-derived Shannon entropy identifies similar regions of high structure in both cell-free and in-cell SHAPE data sets. (E) Correlation analysis of SHAPE reactivities colored by low (black), medium (yellow), and high (red) reactivity shows a clear decrease in slope for highly reactive nucleotides, despite an overall correlation comparable to in-cell replicates. This is clearly visible in comparison plots of reactivities in F where we see that the in-cell data is significantly lower mostly for the highly reactive nucleotides. (G) Modeled base-pairing probabilities (arc diagrams) and corresponding minimum free-energy structures of low SHAPE, low Shannon entropy regions in the MALAT1 mRNA suggesting in-cell and cell-free structures are very similar.
When we compare the SHAPE profiles for representative regions in cells (black) with cell-free (yellow) in Figure 2E we observe that the SHAPE pattern is overall the same. The nucleotides with the largest differences seem to be high SHAPE regions (colored as red, SHAPE > 0.8 in the side bar), with the cell-free reactivity in most cases higher than that of in-cell. This is further evident in the scatter plot illustrated in Figure 2F, where the slope of the highly reactive nucleotides is lower than that of the medium (yellow) and low (black) nucleotides. As a reminder, replicates have a slope of 1.00 in Figure 2B. The quality of our data therefore allows us to observe these subtle but important global differences in SHAPE reactivity.
When we incorporate the cell-free data in secondary structure models of low SHAPE, low Shannon entropy regions (shown in Fig. 2G) we observe similar base-pairing probabilities and even minimum free-energy structures. These results contradict some other structural probing studies of cellular RNAs (Rouskin et al. 2014; Tomezsko et al. 2020), which reported very large differences for data collected in cell and cell-free conditions for messenger RNAs. It is important to note that we have collected data with high levels of reproducibility, as is evidenced by our replicate analysis in Figure 1. This requires very deep sequencing of SHAPE-MaP libraries and particular care when performing probing experiments. Interestingly, applying this approach we observe strong agreement between cell-free and in-cell conditions in our experiments. Nonetheless, we do observe a systematic difference, where highly reactive nucleotides tend to be less reactive in cells, which is consistent with a global shielding of the chemical probe by the cellular environment. When we model the low-SHAPE low-Shannon entropy regions incorporating cell-free and in-cell we obtain similar structures (Fig. 2G). It is the pattern of SHAPE reactivity and not its absolute value that directs structure prediction which explains this observation (Deigan et al. 2009; Hajdin et al. 2013; Spasic et al. 2018). Thus, it appears that RNA binding proteins and other cellular components do not have a strong impact on the structure of the MALAT1 RNA. Nonetheless, we cannot exclude that miRNAs and U1 snRNA, which have been suggested to bind MALAT1 and which would be present in our cell-free experiment may still be affecting the RNA structure (McCown et al. 2019; Wang et al. 2021). That being said, both U1 snRNA and miRNAs require RNA binding proteins to bind effectively, which are not present in our cell-free experiment, which makes it unlikely they are affecting MALAT1 structure in a significant way.
Effect of different cellular environments on MALAT1 lncRNA structure
So far, we have established that our amplicon tiling strategy produces quantitatively reproducible data both in cells and cell-free. This is essential, as even when we compare in-cell to cell-free conditions we obtain remarkably similar SHAPE profiles. Because the underlying protein free structure of MALAT1 is unchanged regardless of cell source (Fig. 2B), and because in-cell probing yields very similar reactivity profiles to probing of cell-free RNA (Fig. 2E), we asked whether we might observe differences in lncRNA structure if we compare SHAPE data collected from in-cell probing of two different cell lines. We therefore collected replicates of in-cell SHAPE data in Hek293 cells, which express the same transcript isoform observed in A549 cells (Fig. 3A; Supplemental Fig. 1B). In these cell lines we also obtain reproducible data across two replicates (Fig. 3B), from which we obtain median SHAPE profiles (Fig. 3C), and identify low SHAPE, low Shannon regions that mostly overlap with those previously identified (Fig. 3D). Indeed, when we compare the Hek293 averaged replicate data to A549 averaged replicate data (Fig. 3E), we observe a nearly identical pattern and a high correlation (R2 = 0.79). We do not observe any systematic differences between low, medium, and high SHAPE slopes, but the overall slope (Fig. 3E) is 0.78. This likely indicates that the 5NIA SHAPE reagent more easily permeates A549 cells, with a slightly lower overall reactivity in Hek293 cells. Nonetheless, the in-cell base-pairing probabilities and corresponding minimum free-energy models informed by these data are nearly identical (Fig. 3F) for low SHAPE, low Shannon entropy regions. Thus, although the specific cell line affects overall SHAPE reactivity (likely due to differences in overall cell permeability of the SHAPE reagent), the structure of MALAT1 is consistent across different cellular conditions.
FIGURE 3.
Human MALAT1 RNA structure is nearly identical in cells in two different cell lines. (A) Schematic of expressed MALAT1 transcript in human (GRCh38) genome coordinates observed in both A549 and Hek293 cells. (B) In-cell SHAPE-MaP data is highly reproducible across replicates in Hek293 cells. (C) Median windowed (50 nt) SHAPE reactivity for Hek293 in-cell data. (D) SHAPE-derived Shannon entropy identifies similar regions of high structure in Hek293 cell lines. (E) Correlation analysis of SHAPE reactivities colored by low (black), medium (yellow), and high (red) reactivity show's high correlation between Hek293 and A549 cells. (F) Modeled base-pairing probabilities (arc diagrams) and corresponding minimum free-energy structures of low SHAPE, low Shannon entropy regions in the MALAT1 mRNA showing identical structures for both cell types.
Structural differences of MALAT1 lncRNA in the green monkey Vero cell
Given the quantitative consistency of our data across different experimental conditions, we now investigate the structure of MALAT1 in the African green monkey, Chlorocebus sabaeus, also known as the vervet using the Vero cell line (ATCC CCL 81, derived from kidney tissue of adult African green monkey). The African green monkey and human share a common ancestor from approximately 25 million years ago (Kumar and Hedges 1998). As with Hek293 and A549 cells, we performed RNA-seq in Vero cells and identified the region of the transcript that is expressed (Fig. 4A). As can be seen in Supplemental Figure 2, the Vero MALAT1 transcript is not spliced and has very similar primary structure to the human cell lines examined in this study, including a similar length. Since MALAT1 is unspliced in Vero cells as well, we were able to align Vero transcripts to the Chlorocebus sabeus genome (GenBank Accession GCA_000409795.2) to chromosome 1. We used genomic sequences that were aligned to human genomic coordinates for MALAT1 (Karolchik et al. 2008; Pollard et al. 2010). We also observed the putative conserved triple helix region (Supplemental Fig. 2), based both on sequence conservation and lower read depth (as we also observed in human). This is illustrated in Figure 4A, and we define the observed Vero transcript in Figure 4B.
FIGURE 4.
Structure of MALAT1 in green monkey Vero cells in versus cell-free reveals global changes in reactivity with minimum effect on highly structured regions of MALAT1. (A) RNA-seq coverage of expressed MALAT1 transcript in Vero cells from which we determined that the (B) observed transcript isoform in these cells is not alternatively spliced. (C) In-cell SHAPE-MaP data is highly reproducible across replicates as can also be seen in (D) representative traces of SHAPE data for two regions in MALAT1. (E) Median windowed (50 nt) SHAPE reactivity for in-cell data combined with SHAPE-derived Shannon entropy. (F) Correlation analysis of SHAPE reactivities colored by low (black), medium (yellow), and high (red) reactivity shows a clear decrease in slope for highly reactive nucleotides, despite an overall correlation comparable to in-cell replicates. This is also apparent in (G) representative regions of SHAPE data. (H) Modeled base-pairing probabilities (arc diagrams) and corresponding minimum free-energy structures of low SHAPE, low Shannon entropy regions in the MALAT1 mRNA showing very similar structures both in and cell-free.
As with previous data sets reported, we observe very high replicate in-cell correlation overall (Fig. 4C) and for specific regions as well (Fig. 4D). We used similar primers for the amplicons as those used in human cell lines (Supplemental Table S1) in these Vero cell experiments (Supplemental Table S2). From these we compute median SHAPE and Shannon entropy to identify low SHAPE, low Shannon entropy regions (Fig. 4E). We collected cell-free MALAT1 data in the Vero cell lines, and when we compared these with the in-cell data (Fig. 4F,G), we observed a similar decrease in slope in the high reactive (red) nucleotides, consistent with nonspecific shielding of the cellular environment. The overall correlation in the data (R2= 0.59) is still good, and the SHAPE patterns of in-cell (green) and cell-free (red) are very similar (Fig. 4G). Similarly, when we model the low-shape, low Shannon entropy regions using these data, we obtain nearly identical base-pairing probabilities and minimum free-energy structures (Fig. 4H). Using these data, we will now investigate how substitutions from human to green monkey have affected the structure of these two MALAT1 lncRNAs.
Structural comparisons of MALAT1 lncRNA in the green monkey and human
To investigate the structural changes caused by substitutions in the MALAT1 transcript, we aligned both SHAPE data sets to a consensus sequence (Fig. 5A). In doing so, we were able to identify 335 substitutions (differences between the human and green monkey transcripts), comprised of 140 transitions, 124 transversions, and 71 insertions and deletions. These substitutions are indicated using vertical white lines in Figure 5B. As is apparent from this analysis, the substitutions are not evenly distributed but appear throughout the transcript. In Figure 5C we created a mountain plot where the y-axis is the number of substitutions in a 50-nt window. Certain mountains peak at over 15 substitutions per 50-nt window (e.g., near nucleotide 2000) with an effective divergence rate of 30% (i.e., 0.3 substitutions per site), while other regions (e.g., centered in nucleotide 4250) have a divergence rate around 2%.
FIGURE 5.
Comparative structural analysis of human and green monkey MALAT1 lncRNAs. (A) Consensus MALAT1 transcript observed in all cell lines. (B) Substitution analysis of human and green monkey MALAT1 lncRNAs, vertical white lines indicate sites of substitution, insertion, or deletion. (C) Mountain plots of substitutions using a 50-nt window, where the y-axis is the number of substitutions in the window, illustrating regions of high substitution. (D) Arc plot diagrams including alignments of Vero and A549 base-pairing probabilities for three representative low-SHAPE low Shannon entropy overlapping regions. Sequence alignment is shown, and sites of substitution are indicated with an asterisk (*). These indicate a high level of structural conservation, which can be seen in E, the corresponding minimum free-energy structures. Sites of substitution, insertion, and deletion are indicated with blue labels on the structures. (F) SHAPE reactivity analysis of sites with versus without substitutions for the in-cell data sets reveal a higher median SHAPE for sites with substitutions residues. (G) The best-fit logistic regression of substitution presence/absence as a function of SHAPE reactivity (solid black lines) estimated that substitution rate increased 131% (Hek293) and 120% (A549) with every unit increase in SHAPE reactivity. Gray bars show the proportion of sites with substitutions (±95% confidence intervals) for sites with SHAPE reactivities less than 0.5, 0.5–1.5, 1.5–2.5, and greater than 2.5. The number of sites within each bin is shown at the bottom of the bar.
SHAPE-directed structural modeling was optimized using reference RNAs with well-defined structures. These reference structures generally had low median SHAPE reactivity and low Shannon entropy (Deigan et al. 2009; Hajdin et al. 2013). As such, we can only compare SHAPE-directed structures for regions of low-SHAPE, low Shannon entropy that overlap between human and green monkey. In Figure 5D we show arc probability plots for the three most similar regions, which also have intermediate divergence rates. We used the in-cell SHAPE data to model these structures to capture the most biologically relevant conformations, although these regions fold very similarly cell-free.
What is immediately apparent from the base-pairing probability plots in Figure 5D is their high degree of similarity. Indeed, when we analyze the SHAPE-directed minimum free-energy structures we also observe similar folds (Fig. 5E). It is important that not all elements of the secondary structure are identical though, and the structures do differ in specific helices. Overall substitutions in unpaired regions appear to not alter the RNA structures, but those mapping to paired regions do. Given the high degree of experimental structural similarity observed in these three regions, we performed R-scape analysis to look for evidence of conserved RNA structure in pairwise covariations (Rivas et al. 2017). For the three conserved structures shown in Figure 5E, R-scape analysis found no significant base pairs. Stockholm formatted alignments are provided in the Supplemental Material.
We decided to investigate whether paired on unpaired nucleotides have a different rate of substitution. We used a t-test to compare the SHAPE reactivity of conserved positions (identical nucleotide in humans and green monkeys) versus substituted positions (different nucleotide in humans than in green monkeys) in the MALAT1 transcript. Consistent with the structure conservation, conserved positions exhibited significantly lower SHAPE reactivities than substituted positions (measured in Hek293: 0.050 difference in mean SHAPE reactivity, t = 1.92, df = 342, P = 0.028; measured in A549: 0.057 difference in mean SHAPE reactivity, t = 1.70, df = 348, P = 0.045; using one-sample t-tests), Figure 5F. Thus, conserved positions are more likely to be paired than substituted positions. Flipping the causative relationship around, we then used logistic regression to investigate the effect of SHAPE reactivity on the rate of nucleotide substitutions. The logistic regression analysis estimated a 131% (or 120%) increase in substitution rate with every increase in SHAPE reactivity of 1 unit as measured in the Hek293 (or A549) cell lines (Table 1; Fig. 5G). Thus, the least-paired positions (SHAPE ≥ 3), exhibit substitution rates that are at least 1.313 = 2.25 (or 1.23 = 1.73) times higher than the most-paired positions (SHAPE ≤ 0), suggesting unpaired nucleotides are more likely to be substituted. Note that since we perform a background subtraction of untreated cellular RNA during the SHAPE-MaP experiment, any effects of differential levels on methylation at specific nucleotides will effectively be canceled out.
TABLE 1.
Logistic regression estimates of the effect of SHAPE reactivity on substitution rate
DISCUSSION
We investigated the structure of the lncRNA MALAT1 expecting to detect regions where either the cellular environment or evolutionary substitutions significantly altered the structure. It is likely lncRNA function is cell-type-specific, since they are often expressed at low levels in a tissue-specific manner (Cabili et al. 2011). In fact, RNA structure in general is impacted by the cellular environment (Rouskin et al. 2014; Tomezsko et al. 2020). This suggests that it is also important to understand the effect the environment has on the lncRNA and the different pressures that may be affecting how it folds in the cell.
Previous studies have shown that it is difficult to identify even homologs of lncRNAs past a relatively short evolutionary distance (Necsulea and Kaessmann 2014; Necsulea et al. 2014; Hezroni et al. 2015), so we focused our attention on an exceptional lncRNA that is known to have function and is also known to be conserved in many different species, albeit structural conservation is seen only at the 3′ end outside of mammals (Brown et al. 2014; Xiping et al. 2018). MALAT1 is an abundant nuclear RNA, with roles in modulating gene expression by binding to transcription factors and other RNAs (Wilusz et al. 2008, 2012; Brown et al. 2012). It also is known to bind to many additional proteins (Spiniello et al. 2018; Scherer et al. 2020).
We used SHAPE-MaP and a gene-specific primer amplification strategy to examine the full-length MALAT1 structure in cells and cell-free, in two cell types, and across two species (African green monkey and human). First, we demonstrated high reproducibility between replicates in all cells studied. For each cell line, we observed that MALAT1 structure was very similar whether probed in-cell or cell-free. When we compared A549 cells to Hek293 cells, anticipating a remodeling of structure in the different in-cell contexts, we saw that MALAT1 structure remained largely unchanged. This result is especially surprising given MALAT1's conferral of metastatic potential in specific lung cancer cell lines, including A549 cells (Gutschner et al. 2013). Finally, we were curious if MALAT1 structure would be impacted by evolutionary substitutions that accrue between species. When we compared MALAT1 structure models derived from in-cell probing of either a human cell line or a primate cell line, despite differences in sequence, we saw structure was still conserved, especially in common low-SHAPE, low-Shannon entropy regions.
Our metric for structural conservation is a very global one in this context. We chose to simply measure the correlation between our SHAPE data in different conditions. We chose to use this approach as opposed to compare SHAPE-directed structural models as these are notoriously sensitive to small differences in the data (Kladwang et al. 2011). This is especially true for large RNAs like MALAT1. When we focus on regions of low-SHAPE, low-Shannon entropy, the similarity between these high-confidence structural models continue to support our finding that there is structural conservation between Vervet and human MALAT1. We considered computing windowed correlation coefficients to detect regions of high structural divergence. However, we found that this approach was very sensitive to small changes in the data and picking the correct window to use was arbitrary.
In Supplemental Figure 3 we have plotted distributions of SHAPE reactivity as a function of nucleotide. We observe a higher median SHAPE reactivity for adenine nucleotides, in agreement with previous reports (Busan et al. 2019). This raises the possibility that regions enriched for adenine nucleotides, by chance or biology, might be mis-identified as unstructured. We observe overrepresentation of adenines in high SHAPE, high Shannon entropy regions (frequency of adenines = 0.35) compared to low SHAPE, low Shannon entropy regions (frequency of adenines = 0.29). To assess this possibility, we used a bootstrap resampling method to determine what fraction of 50 bp windows (which are used for low-SHAPE, low Shannon entropy determination) would be expected to exceed a mean reactivity of 0.8 (high SHAPE) by chance. We generated 10−6 independent samples of 50 bp each, by randomly selecting 50 positions in the MALAT1 sequence, and calculated the mean reactivity for each sample. The resulting bootstrap data confirm that mean reactivity within the window/sample increases with the number of A bases in the window/sample, but only by an average of 0.0068 per additional A base (Supplemental Fig. 3C). Among all the bootstrap samples, the proportion with mean reactivity >0.8 was 0.00053 (i.e., the probability of mean r > 0.8). Even among samples for which the frequency of A bases was at least 0.5, the proportion with mean reactivity >0.8 was 0.0031. Thus, although we cannot exclude that the nucleotide bias may have misidentified regions as unstructured, the probability of this occurring is small. This is likely because the pseudo-free-energy term is robust in the ability to predict RNA structure (Deigan et al. 2009; Wilkinson et al. 2009).
There are important caveats to this work. The fact that lncRNA MALAT1 has low-SHAPE, low Shannon entropy structures that are conserved does not necessarily mean that those structures are functional. They may simply appear conserved because there has not been sufficient evolutionary drift to disrupt them. Of course, it is possible that SHAPE chemical probing is simply not sufficiently sensitive to capture subtle changes in structure. Finally, we focused on a relatively short evolutionary distance, anticipating that sequence differences would have a large impact on structure. It will be important to examine the role of structure across longer evolutionary distances, and an obvious next step would be to probe the murine MALAT1 lncRNA. However, mouse MALAT1 sequence is significantly diverged from human, making direct alignments more complex and this approach may not yield any conserved structures. Thus, an intermediate evolutionary distance between monkey and mice could also be interesting (for example, marmoset and lemur).
MATERIALS AND METHODS
Cell culture
A549, Hek293, and Vero cells (ATCC CCL 185, CRL 1573, CCL 81, respectively) were cultured on tissue culture plates in DMEM supplemented with 10% fetal bovine serum (FBS) and 0.5% penicillin and streptomycin at 37°C and 5% CO2. Experiments were performed before cells were completely confluent.
In-cell RNA SHAPE treatment and RNA extraction
Cells were grown in 10-cm plates and grown to 80%–90% confluency. Cells were washed with PBS and replenished with 1350 µL of media supplemented with 200 mM bicine (pH 8.0) buffer (final concentration). A total of 150 µL 5-nitroisatoic anhydride (5NIA) at 250 mM in dimethyl sulfoxide (DMSO) was added. Controls were treated with 150 µL DMSO. Cells were then incubated at 37°C for 10 min. RNA was extracted by TRIzol purification and DNase digestion (Thermo Fisher TRIzol, 5PRIME Phase Lock Heavy, Invitrogen Purelink RNA columns, Thermo Fisher Purelink DNase Set) and quantified with a NanoDrop spectrophotometer.
Cell-free RNA SHAPE treatment
Cell-free experiments were performed on the same cell lines described above on natively transcribed RNA. RNA was TRIzol extracted and quantified (as above). An amount of 10 µg RNA from each experiment was incubated at 37°C for 10 min in folding buffer containing 100 mM Na-HEPES, pH 8.0, 100 mM NaCl, and 10 mM MgCl2. The RNA was incubated at 37°C for 5 min with 10% DMSO or with 250 mM 5NIA in DMSO. Columns were used to purify the modified RNA (Invitrogen Purelink RNA columns).
Reverse transcription and library preparation
For each sample of RNA, 3–10 µg was reverse transcribed using SHAPE-MaP reverse transcription with SuperScript II, random nonamers, and low-fidelity conditions for all samples (Siegfried et al. 2014; Smola et al. 2015). (Thermo Fisher Scientific SuperScript II, NEB random nonamers). The samples were purified with Ampure XP beads to isolate the cDNA and eluted to 30 µL. We designed species-specific MALAT1 tiling primers for human and green monkey (Supplemental Table S1) and performed multiplex PCRs following the Qiagen multiplex PCR protocol (Qiagen). cDNA was divided equally between primer sets. We performed secondary PCR to add TruSeq barcodes (NEB Q5 HotStart). Libraries were sequenced as paired end 2 × 250 read multiplex runs on a MiSeq instrument.
SHAPE data analysis and RNA structure prediction
Two experimental repeats were collected for each cell type and condition. For the cell-free data it included two Hek293 cell-free repeats and a single A549 in-cell experiment. SHAPE reactivities were derived using the ShapeMapper2 pipeline (Busan and Weeks 2018). A549 in-cell repeats were combined; each reactivity for each nucleotide in A549 both in cell replicate experiments were combined and averaged to yield an averaged reactivity for each nucleotide. To compare data across experiments, each data set was averaged and combined (as A549 in-cell above) and normalized to A549 in-cell combined data. We normalized by finding the median SHAPE reactivity for each experiment and adjusting to match A549 in-cell combined median by multiplying each value in that condition by the ratio of the A549 in-cell shape median over the shape median for each specific condition. For structure diagrams, median shape, and Shannon entropy, the repeats for each condition were combined and averaged and normalized to A549 in-cell data and are shown. For human cell-free data, one replicate each of A549 cell-free and Hek293 cell-free were combined, averaged, and normalized as above. Arc plots and graphs for median shape reactivity and Shannon entropy were obtained from SuperFold, which also incorporates chemical probing data (Smola et al. 2015). SuperFold was also used to find regions of significant structure in each of the conditions and was used to generate minimum free-energy secondary structure models. SuperFold uses RNAstructure to calculate MFE, again, with chemical probing data incorporated (Deigan et al. 2009; Wilkinson et al. 2009). Secondary structure plots were created with Varna (Darty et al. 2009). Reactivity plots and scatter plots were created in Python using the PyPlot package. We performed R-scape (Rivas et al. 2017) analysis on the three regions of similar structure identified in Figure 5E as follows. We used NCBI BLAST to identify all sequences “somewhat similar” to the reference human sequence (using the blastn -p), and aligned the resulting sequences using Clustal (Sievers and Higgins 2014, 2021) for input into R-scape.
MALAT1 conservation analysis and alignments and genome reference
We downloaded PhyloP-scored alignments of 30 mammals from UCSC Genome Browser (http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phyloP30way/) (Karolchik et al. 2008; Pollard et al. 2010) and show the PhyloP scores of genome coordinates coordinating with MALAT1. Since the full-length MALAT1 transcript is not annotated, we used the genomic African green monkey sequence (GenBank Accession GCA_000409795.2) that was aligned to human MALAT1 from the PhlyoP alignments above to align our RNA-seq reads from vervet MALAT1 pull-down (Supplemental Material). We were able to align our RNA-seq reads to genomic African green monkey and report MALAT1 expression in Vero cells, and we found that Vero cells expressed a transcript with similar sequence and length to human Hek293 and A549 cells. Clustal Omega was used to align our human and vervet MALAT1 sequences to find a consensus sequence (Sievers and Higgins 2014, 2021). The sequences were then compared, and at each point where they differed, a substitution was logged. Substitution types were classified into four different groups: transition or transversion if a nucleotide was present in both sequences, and deletion or insertion if only one sequence had a nucleotide at that position according to the alignment (insertion and deletion were categorized relative to the human sequence). At each position in the sequence, SHAPE data was logged and categorized according to whether a substitution was present at that position, and then further split by type. All data used during this process comes from the combined in-cell experiments.
For each cell type (Hek293, A549, and Vero), the resulting data sets for conserved (human and green monkey sequences are identical) and diverged (human and green monkey sequences differ) positions were compared. Violin, strip, and box plots were generated in the Seaborn package for Python. t-tests were performed using the t.test function from the stats package in R version 2.2.0. Logistic regressions were performed using the glm function from the stats package in R, using a binomial error distribution and a logit link function. The IGV viewer was used as a reference to visualize MALAT1 coordinates in the human genome hg38 (Robinson et al. 2010).
DATA DEPOSITION
Raw sequence reads are available at the sequence read archive (SRA accession number PRJNA816530), SHAPE reactivities are provided in the Supplemental Material as a SNRNASM file, and a Github page (https://github.com/LaederachLab/malat1) was created to facilitate access to the code used to analyze the data. Secondary structure files in .ct file format for high confidence regions are provided in the Supplemental Material and summarized in Supplemental Table 3.
SUPPLEMENTAL MATERIAL
Supplemental material is available for this article.
Supplementary Material
ACKNOWLEDGMENTS
This work was supported by U.S. National Institutes of Health grants NHLBI R01 HL111527 and NIGMS R35 GM140844 to A.L. A.M.E. was supported by the National Science Foundation Graduate Research fellowship under grant number 5105482. C.A.W. is supported by U.S. National Institutes of Health grant K22CA262349.
Footnotes
Article is online at http://www.rnajournal.org/cgi/doi/10.1261/rna.079388.122.
Freely available online through the RNA Open Access option.
REFERENCES
- Arun G, Aggarwal D, Spector DL. 2020. MALAT1 long non-coding RNA: functional implications. Noncoding RNA 6: 22. 10.3390/ncrna6020022 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown JA, Valenstein ML, Yario TA, Tycowski KT, Steitz JA. 2012. Formation of triple-helical structures by the 3′-end sequences of MALAT1 and MENβ noncoding RNAs. Proc Natl Acad Sci 109: 19202–19207. 10.1073/pnas.1217338109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown JA, Bulkley D, Wang J, Valenstein ML, Yario TA, Steitz TA, Steitz JA. 2014. Structural insights into the stabilization of MALAT1 noncoding RNA by a bipartite triple helix. Nat Struct Mol Biol 21: 633–640. 10.1038/nsmb.2844 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busan S, Weeks KM. 2017. Visualization of RNA structure models within the Integrative Genomics Viewer. RNA 23: 1012–1018. 10.1261/rna.060194.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busan S, Weeks KM. 2018. Accurate detection of chemical modifications in RNA by mutational profiling (MaP) with ShapeMapper 2. RNA 24: 143–148. 10.1261/rna.061945.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busan S, Weidmann CA, Sengupta A, Weeks KM. 2019. Guidelines for SHAPE reagent choice and detection strategy for RNA structure probing studies. Biochemistry 58: 2655–2664. 10.1021/acs.biochem.8b01218 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL. 2011. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25: 1915–1927. 10.1101/gad.17446611 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corley M, Solem A, Phillips G, Lackey L, Ziehr B, Vincent HA, Mustoe AM, Ramos SBV, Weeks KM, Moorman NJ, et al. 2017. An RNA structure-mediated, posttranscriptional model of human α-1-antitrypsin expression. Proc Natl Acad Sci 114: E10244–E10253. 10.1073/pnas.1706539114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dadonaite B, Gilbertson B, Knight ML, Trifkovic S, Rockman S, Laederach A, Brown LE, Fodor E, Bauer DLV. 2019. The structure of the influenza A virus genome. Nat Microbiol 4: 1781–1789. 10.1038/s41564-019-0513-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Darty K, Denise A, Ponty Y. 2009. VARNA: interactive drawing and editing of the RNA secondary structure. Bioinformatics 25: 1974–1975. 10.1093/bioinformatics/btp250 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deigan KE, Li TW, Mathews DH, Weeks KM. 2009. Accurate SHAPE-directed RNA structure determination. Proc Natl Acad Sci 106: 97–102. 10.1073/pnas.0806929106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, et al. 2012. Landscape of transcription in human cells. Nature 489: 101–108. 10.1038/nature11233 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutschner T, Hammerle M, Diederichs S. 2013. MALAT1: a paradigm for long noncoding RNA function in cancer. J Mol Med (Berl) 91: 791–801. 10.1007/s00109-013-1028-y [DOI] [PubMed] [Google Scholar]
- Hajdin CE, Bellaousov S, Huggins W, Leonard CW, Mathews DH, Weeks KM. 2013. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc Natl Acad Sci 110: 5498–5503. 10.1073/pnas.1219988110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hamada M, Kiryu H, Sato K, Mituyama T, Asai K. 2009. Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics 25: 465–473. 10.1093/bioinformatics/btn601 [DOI] [PubMed] [Google Scholar]
- Hezroni H, Koppstein D, Schwartz MG, Avrutin A, Bartel DP, Ulitsky I. 2015. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep 11: 1110–1122. 10.1016/j.celrep.2015.04.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hutchinson JN, Ensminger AW, Clemson CM, Lynch CR, Lawrence JB, Chess A. 2007. A screen for nuclear transcripts identifies two linked noncoding RNAs associated with SC35 splicing domains. BMC Genomics 8: 39. 10.1186/1471-2164-8-39 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJ, Jackson SE, Wills MR, Weissman JS. 2014. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep 8: 1365–1379. 10.1016/j.celrep.2014.07.045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ji P, Diederichs S, Wang W, Boing S, Metzger R, Schneider PM, Tidow N, Brandt B, Buerger H, Bulk E, et al. 2003. MALAT-1, a novel noncoding RNA, and thymosin β4 predict metastasis and survival in early-stage non-small cell lung cancer. Oncogene 22: 8031–8041. 10.1038/sj.onc.1206928 [DOI] [PubMed] [Google Scholar]
- Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, et al. 2008. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res 36: D773–D779. 10.1093/nar/gkm966 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kennedy R, Lladser ME, Yarus M, Knight R. 2008. Information, probability, and the abundance of the simplest RNA active sites. Front Biosci 13: 6060–6071. 10.2741/3137 [DOI] [PubMed] [Google Scholar]
- Kim J, Piao HL, Kim BJ, Yao F, Han Z, Wang Y, Xiao Z, Siverly AN, Lawhon SE, Ton BN, et al. 2018. Long noncoding RNA MALAT1 suppresses breast cancer metastasis. Nat Genet 50: 1705–1715. 10.1038/s41588-018-0252-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kladwang W, VanLang CC, Cordero P, Das R. 2011. Understanding the errors of SHAPE-directed RNA structure modeling. Biochemistry 50: 8049–8056. 10.1021/bi200524n [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar S, Hedges SB. 1998. A molecular timescale for vertebrate evolution. Nature 392: 917–920. 10.1038/31927 [DOI] [PubMed] [Google Scholar]
- Kutchko KM, Laederach A. 2017. Transcending the prediction paradigm: novel applications of SHAPE to RNA function and evolution. Wiley Interdiscip Rev RNA 8: e1374. 10.1002/wrna.1374 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kutchko KM, Sanders W, Ziehr B, Phillips G, Solem A, Halvorsen M, Weeks KM, Moorman N, Laederach A. 2015. Multiple conformations are a conserved and regulatory feature of the RB1 5′ UTR. RNA 21: 1274–1285. 10.1261/rna.049221.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kutchko KM, Madden EA, Morrison C, Plante KS, Sanders W, Vincent HA, Cruz Cisneros MC, Long KM, Moorman NJ, Heise MT, et al. 2018. Structural divergence creates new functional features in alphavirus genomes. Nucleic Acids Res 46: 3657–3670. 10.1093/nar/gky012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lackey L, Coria A, Woods C, McArthur E, Laederach A. 2018. Allele-specific SHAPE-MaP assessment of the effects of somatic variation and protein binding on mRNA structure. RNA 24: 513–528. 10.1261/rna.064469.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mathews DH. 2006. Revolutions in RNA secondary structure prediction. J Mol Biol 359: 526–532. 10.1016/j.jmb.2006.01.067 [DOI] [PubMed] [Google Scholar]
- McCown PJ, Wang MC, Jaeger L, Brown JA. 2019. Secondary structural model of human MALAT1 reveals multiple structure-function relationships. Int J Mol Sci 20: 5610. 10.3390/ijms20225610 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mustoe AM, Busan S, Rice GM, Hajdin CE, Peterson BK, Ruda VM, Kubica N, Nutiu R, Baryza JL, Weeks KM. 2018. Pervasive regulatory functions of mRNA structure revealed by high-resolution SHAPE probing. Cell 173: 181–195.e118. 10.1016/j.cell.2018.02.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nakagawa S, Ip JY, Shioi G, Tripathi V, Zong X, Hirose T, Prasanth KV. 2012. Malat1 is not an essential component of nuclear speckles in mice. RNA 18: 1487–1499. 10.1261/rna.033217.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Necsulea A, Kaessmann H. 2014. Evolutionary dynamics of coding and non-coding transcriptomes. Nat Rev Genet 15: 734–748. 10.1038/nrg3802 [DOI] [PubMed] [Google Scholar]
- Necsulea A, Soumillon M, Warnefors M, Liechti A, Daish T, Zeller U, Baker JC, Grutzner F, Kaessmann H. 2014. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505: 635–640. 10.1038/nature12943 [DOI] [PubMed] [Google Scholar]
- Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. 2010. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20: 110–121. 10.1101/gr.097857.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rice GM, Leonard CW, Weeks KM. 2014. RNA secondary structure modeling at consistent high accuracy using differential SHAPE. RNA 20: 846–854. 10.1261/rna.043323.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rivas E. 2021. Evolutionary conservation of RNA sequence and structure. Wiley Interdiscip Rev RNA 12: e1649. 10.1002/wrna.1649 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rivas E, Clements J, Eddy SR. 2017. A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs. Nat Methods 14: 45–48. 10.1038/nmeth.4066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139–140. 10.1093/bioinformatics/btp616 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rouskin S, Zubradt M, Washietl S, Kellis M, Weissman JS. 2014. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505: 701–705. 10.1038/nature12894 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scherer M, Levin M, Butter F, Scheibe M. 2020. Quantitative proteomics to identify nuclear RNA-binding proteins of Malat1. Int J Mol Sci 21: 1166. 10.3390/ijms21031166 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siegfried NA, Busan S, Rice GM, Nelson JA, Weeks KM. 2014. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat Methods 11: 959–965. 10.1038/nmeth.3029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sievers F, Higgins DG. 2014. Clustal Omega. Curr Protoc Bioinformatics 48: 3.13.1–3.13.16. 10.1002/0471250953.bi0313s48 [DOI] [PubMed] [Google Scholar]
- Sievers F, Higgins DG. 2021. The Clustal Omega multiple alignment package. Methods Mol Biol 2231: 3–16. 10.1007/978-1-0716-1036-7_1 [DOI] [PubMed] [Google Scholar]
- Smola MJ, Weeks KM. 2018. In-cell RNA structure probing with SHAPE-MaP. Nat Protoc 13: 1181–1195. 10.1038/nprot.2018.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smola MJ, Rice GM, Busan S, Siegfried NA, Weeks KM. 2015. Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis. Nat Protoc 10: 1643–1669. 10.1038/nprot.2015.103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smola MJ, Christy TW, Inoue K, Nicholson CO, Friedersdorf M, Keene JD, Lee DM, Calabrese JM, Weeks KM. 2016. SHAPE reveals transcript-wide interactions, complex structural domains, and protein interactions across the Xist lncRNA in living cells. Proc Natl Acad Sci 113: 10322–10327. 10.1073/pnas.1600008113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spasic A, Assmann SM, Bevilacqua PC, Mathews DH. 2018. Modeling RNA secondary structure folding ensembles using SHAPE mapping data. Nucleic Acids Res 46: 314–323. 10.1093/nar/gkx1057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spiniello M, Knoener RA, Steinbrink MI, Yang B, Cesnik AJ, Buxton KE, Scalf M, Jarrard DF, Smith LM. 2018. HyPR-MS for multiplexed discovery of MALAT1, NEAT1, and NORAD lncRNA protein interactomes. J Proteome Res 17: 3022–3038. 10.1021/acs.jproteome.8b00189 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun Q, Hao Q, Prasanth KV. 2017. Nuclear long noncoding RNAs: key regulators of gene expression. Trends Genet 34: 142–157. 10.1016/j.tig.2017.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tavares RCA, Pyle AM, Somarowthu S. 2019. Phylogenetic analysis with improved parameters reveals conservation in lncRNA structures. J Mol Biol 431: 1592–1603. 10.1016/j.jmb.2019.03.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tomezsko P, Swaminathan H, Rouskin S. 2020. Viral RNA structure analysis using DMS-MaPseq. Methods 183: 68–75. 10.1016/j.ymeth.2020.04.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tripathi V, Ellis JD, Shen Z, Song DY, Pan Q, Watt AT, Freier SM, Bennett CF, Sharma A, Bubulya PA, et al. 2010. The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol Cell 39: 925–938. 10.1016/j.molcel.2010.08.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tyagi R, Mathews DH. 2007. Predicting helical coaxial stacking in RNA multibranch loops. RNA 13: 939–951. 10.1261/rna.305307 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ulitsky I, Bartel DP. 2013. lincRNAs: genomics, evolution, and mechanisms. Cell 154: 26–46. 10.1016/j.cell.2013.06.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waldern JM, Kumar J, Laederach A. 2021. Disease-associated human genetic variation through the lens of precursor and mature RNA structure. Hum Genet 141: 1659–1672. 10.1007/s00439-021-02395-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang MC, McCown PJ, Schiefelbein GE, Brown JA. 2021. Secondary structural model of MALAT1 becomes unstructured in chronic myeloid leukemia and undergoes structural rearrangement in cervical cancer. Noncoding RNA 7: 6. 10.3390/ncrna7010006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weeks KM. 2015. Review toward all RNA structures, concisely. Biopolymers 103: 438–448. 10.1002/bip.22601 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilkinson KA, Vasa SM, Deigan KE, Mortimer SA, Giddings MC, Weeks KM. 2009. Influence of nucleotide identity on ribose 2′-hydroxyl reactivity in RNA. RNA 15: 1314–1321. 10.1261/rna.1536209 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilusz JE, Freier SM, Spector DL. 2008. 3′ end processing of a long nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell 135: 919–932. 10.1016/j.cell.2008.10.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilusz JE, JnBaptiste CK, Lu LY, Kuhn CD, Joshua-Tor L, Sharp PA. 2012. A triple helix stabilizes the 3′ ends of long noncoding RNAs that lack poly(A) tails. Genes Dev 26: 2392–2407. 10.1101/gad.204438.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woods CT, Laederach A. 2017. Classification of RNA structure change by “gazing” at experimental data. Bioinformatics 33: 1647–1655. 10.1093/bioinformatics/btx041 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woods CT, Lackey L, Williams B, Dokholyan NV, Gotz D, Laederach A. 2017. Comparative visualization of the RNA suboptimal conformational ensemble in vivo. Biophys J 113: 290–301. 10.1016/j.bpj.2017.05.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiping Z, Bo C, Shifeng Y, Feijiang Y, Hongjian Y, Qihui C, Binbin T. 2018. Roles of MALAT1 in development and migration of triple negative and Her-2 positive breast cancer. Oncotarget 9: 2255–2267. 10.18632/oncotarget.23370 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang B, Arun G, Mao YS, Lazar Z, Hung G, Bhattacharjee G, Xiao X, Booth CJ, Wu J, Zhang C, et al. 2012. The lncRNA Malat1 is dispensable for mouse development but its transcription plays a cis-regulatory role in the adult. Cell Rep 2: 111–123. 10.1016/j.celrep.2012.06.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang X, Hamblin MH, Yin KJ. 2017. The long noncoding RNA Malat1: its physiological and pathophysiological functions. RNA Biol 14: 1705–1714. 10.1080/15476286.2017.1358347 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






