Abstract
Cellular RNAs are heterogeneous with respect to their alternative processing and secondary structures, but the functional importance of this complexity is still poorly understood. A set of alternatively processed antisense non-coding transcripts, which are collectively called COOLAIR, are generated at the Arabidopsis floral-repressor locus FLOWERING LOCUS C (FLC)1. Different isoforms of COOLAIR influence FLC transcriptional output in warm and cold conditions2–7. Here, to further investigate the function of COOLAIR, we developed an RNA structure-profiling method to determine the in vivo structure of single RNA molecules rather than the RNA population average. This revealed that individual isoforms of the COOLAIR transcript adopt multiple structures with different conformational dynamics. The major distally polyadenylated COOLAIR isoform in warm conditions adopts three predominant structural conformations, the proportions and conformations of which change after cold exposure. An alternatively spliced, strongly cold-upregulated distal COOLAIR isoform6 shows high structural diversity, in contrast to proximally polyadenylated COOLAIR. A hyper-variable COOLAIR structural element was identified that was complementary to the FLC transcription start site. Mutations altering the structure of this region changed FLC expression and flowering time, consistent with an important regulatory role of the COOLAIR structure in FLC transcription. Our work demonstrates that isoforms of non-coding RNA transcripts adopt multiple distinct and functionally relevant structural conformations, which change in abundance and shape in response to external conditions.
Subject terms: Long non-coding RNAs, Gene silencing
The structures of single COOLAIR RNA isoforms change in abundance and shape in response to external conditions; structural mutation of these isoforms altered FLC expression and flowering time, consistent with a regulatory role of the COOLAIR structure in FLC transcription.
Main
COOLAIR transcripts are alternatively polyadenylated at proximal sites to give around 400-nucleotide (nt) class I transcripts, or at distal sites to give around 600–750-nt class II transcripts1 (Fig. 1a). The different COOLAIR isoforms have been functionally linked to R-loop-mediated chromatin silencing, transcriptional derepression in warm-grown plants2,7 and FLC transcriptional silencing in the cold3,4,6, through as yet poorly understood mechanisms. The secondary structure of RNA is emerging as an important regulator of RNA function8. Structural analysis of in vitro synthesized COOLAIR revealed the evolutionary conservation of class II COOLAIR structures, despite low nucleotide sequence identity5. However, knowledge of the COOLAIR structure in vivo is necessary to understand the function and complexity of COOLAIR in living cells. Current chemical probing methods were limiting for this purpose for two reasons: first, it has not been possible to accurately profile the full-length structural landscape and distinguish structures in shared regions between isoforms using short-read sequencing platforms; second, RNA conformational heterogeneity complicates querying the RNA secondary structures after chemical probing. Despite recent improvements in these techniques9–11 (Supplementary Discussion), the ability to directly identify different RNA isoforms and determine single-molecule in vivo conformations was still difficult. We therefore developed a single-molecule-based RNA secondary structure probing method that enables the direct determination of structural conformations of individual RNA isoforms.
Structural diversity of COOLAIR isoforms
COOLAIR is involved both in modulating the FLC transcriptional output to determine the winter annual or rapid-cycling reproductive strategy of warm-grown plants2 and in facilitating the cold-induced transcriptional shut-down that precedes stable epigenetic silencing of Polycomb Repressive Complex 2 in vernalization3,4,6. We therefore profiled the in vivo RNA secondary structure landscapes of all of the major isoforms, that is, class I and class II COOLAIR transcript isoforms (Fig. 1a and Extended Data Fig. 1a) in wild-type plants (Col FRI) grown in warm conditions and after two weeks of cold exposure when FLC is transcriptionally downregulated1,12. RNA structure determination was carried out using in vivo selective 2′-hydroxyl acylation analysed by primer extension (SHAPE) chemical probing in Arabidopsis thaliana seedlings. The SHAPE reagent, 2-methylnicotinic acid imidazolide (NAI), modifies single-stranded sites of all four RNA nucleotides13. The extracted RNAs were reverse transcribed, and the modified sites led to mutations in the complementary DNA (cDNA) (Fig. 1a). We then adapted the resulting cDNAs into the PacBio platform for single-molecule real-time sequencing, which we call single-molecule-based RNA structure sequencing (smStructure-seq). The derived raw reads were processed to obtain high-accuracy HiFi reads14 to generate the SHAPE reactivities based on the NAI-adduct mutational profiles (Fig. 1a). To benchmark the reproducibility and accuracy of our smStructure-seq data, we calculated the SHAPE reactivities of 18S rRNA. We found that our smStructure-seq libraries were highly reproducible with very high Pearson correlations of 0.95 (P value = 0.2 × 10−16). By comparing our SHAPE reactivities with the 18S rRNA phylogenetic secondary structure15, we found that our smStructure-seq analysis can accurately investigate the full-length RNA structure in vivo (a detailed explanation is provided in the legend of Extended Data Fig. 1b).
We next directly calculated the SHAPE reactivity profiles for class I.i, class I.ii, class II.i and class II.ii COOLAIR isoforms in warm and cold conditions (Fig. 1b and Extended Data Fig. 1c). Class I.i and class I.ii showed relatively few nucleotides with SHAPE reactivity (more than 95% nucleotides of class I isoforms showed no NAI-adduct mutation in warm-grown plants) (Extended Data Fig. 1c). The COOLAIR class I transcripts are associated with a stable R-loop structure2, potentially accounting for this low reactivity. In the same sample, the SHAPE reactivities of class II isoforms in warm-grown plants were much higher (Fig. 1b and Extended Data Fig. 1c). The overall SHAPE profiles were notably different between class II.i and class II.ii (Fig. 1b), even though most of these two isoforms were composed of the same sequence.
Thermodynamic parameter-based RNA structure analysis aims to find the thermodynamically favourable RNA structure16. However, long noncoding RNAs (lncRNAs), such as COOLAIR, are dynamically involved in co-transcriptional regulation and, therefore, thermodynamics may have an incomplete role in determining the RNA structure in vivo17. We therefore developed an analysis method for our smStructure-seq that adopted stochastic context-free grammar (SCFG) constrained by individual SHAPE reactivity profiles, enabling the determination of the RNA structure of single-RNA molecules independent of thermodynamics. We named this structural analysis method DaVinci (Determination of the Variation of the RNA structure conformation through stochastic context-free grammar). DaVinci can construct a wide RNA structure landscape by generating the conformation of individual RNA structures from each in vivo SHAPE mutational profile (Extended Data Fig. 2a). Because DaVinci takes advantage of each single mutational profile rather than the averaged SHAPE mutational profiles, it can identify each possible conformation at single-molecule resolution. To exemplify this, we found that DaVinci could identify a cryptic conformation (conformation 3) of the HIV Rev response element (RRE)18 that was not identified by the chemical-reactivity-based clustering method11 (Extended Data Fig. 2b–e). This cryptic conformation becomes the major conformation when introducing mutations in RRE61 (Supplementary Discussion; more validations are shown in Extended Data Figs. 2f–h and 3). Using DaVinci, we identified at least three major structural conformations of COOLAIR class II.i, the most abundant (Extended Data Fig. 1a) class II isoform in warm conditions (84.6% warm conformation 1; 10% warm conformation 2 and 5.4% warm conformation 3; Fig. 2a–d). These in vivo structural conformations are organized into three domains (Fig. 2a–c): the 5′ domain in exon 1; the 3′ major domain (3′M) or central domain in exon 2; and the 3′ minor domain (3′m), stalk domain also in exon 2. All three warm conformations show a certain similarity to the in vitro class II.i structure5, in the 5′ domain and the 3′m domains, but are distinct in the central 3′M domain (Extended Data Fig. 4a,c,d). Consistently, both measurements of topological similarity (tree alignment, TA) and base-pairing similarity (positive predictive value, PPV) showed that most differences between the in vitro structure and the conformations in the warm conditions are in the central domain (3′M domain) (Extended Data Fig. 4a–d). Notably, this region was proposed to be changed by a single natural nucleotide polymorphism in A. thaliana accession Var2–6 (ref. 7), which enhances the production of class II.iv (Extended Data Fig. 5a), a very rare transcript in Col FRI7. Class II.iv increases FLC expression through a co-transcriptional mechanism that involves the capping of the FLC nascent transcript7. We performed smStructure-seq on a genotype that carries the Var2–6 FLC allele introgressed into Col FRI (Extended Data Fig. 5b). The in vivo structure of class II.iv has a very short helix 4 (H4) and a merged H5 to extend H6 (Extended Data Fig. 5b,c). These structural changes occur in the region complementary to the FLC transcription start site (TSS) (Extended Data Fig. 5b,c). Thus, the greatest conformational variation in distally polyadenylated COOLAIR found in warm-grown plants lies in the region between H4 and H6, which we term the hyper-variable region; this region is complementary to the sequence of the FLC TSS (Extended Data Figs. 4e and 5c).
COOLAIR conformations change in the cold
We then determined COOLAIR isoform-specific structures in plants that had been exposed to cold for two weeks. After cold treatment, SHAPE profiles of class I transcripts still showed a low percentage of modification (Fig. 1b) and class II.i was still the most abundant class II isoform (Extended Data Fig. 1a). We identified at least three class II.i conformations (68.1% cold conformation 1; 17.8% cold conformation 2 and 14.1% cold conformation 3 in Fig. 3). Cold conformations 1 and 2 are structurally similar to warm conformations 1 and 2, but their relative proportions are slightly changed (Figs. 2 and 3). Cold-conformation 3 is distinct from warm conformation 3, with the region between H4 and H6 joined into a long stem in cold conformation 3 (Figs. 2 and 3). Taken together, there are two predominant structural conformations of class II.i, the relative proportions of which change in response to cold, with a new conformation emerging in cold-grown plants (cold conformation 3). Comparing the warm-specific (warm conformation 3) and cold-specific (cold conformation 3) structural landscapes of class II.i, the greatest structural difference again occurs in the hyper-variable H4–H6 region complementary to the FLC TSS (Extended Data Fig. 4f).
By contrast, the strongly cold-upregulated COOLAIR isoform, class II.ii6, which contains an additional exon compared with class II.i, was found not to adopt major conformations (Extended Data Fig. 6a,b). An ensemble-averaged structure model for class II.ii revealed four domains (Extended Data Fig. 6a,b), showing the high structural diversity of this isoform as indicated by the high Shannon entropy (Extended Data Fig. 6c,d). This feature might be involved in its functionality associated with the sequestration of FRIGIDA (FRI)6, the major activator of FLC transcription. FRI associates with a range of co-transcriptional regulators related to RNA polymerase II near the FLC promoter region in warm conditions and is sequestered, in a class-II.ii-dependent manner, into biomolecular condensates away from the FLC promoter after cold exposure6.
COOLAIR structure–function dissection
Our multiple structural comparisons have identified H4–H6 as a hyper-variable region (Extended Data Figs. 4e,f and 5c). To analyse the potential functional role of this region, we generated transgenic plants where the DNA contained four-nucleotide mutations (mut) designed to increase the bulge in the H4–H6 region by shortening H4 and H5 (Fig. 4a–d and Extended Data Fig. 7a). The structural effect of these four mutations was confirmed by smStructure-seq (Fig. 4d). We then performed a systematic characterization of the COOLAIR transcript isoforms in the mut line: the splicing pattern and expression level of COOLAIR were not affected (Extended Data Fig. 8a–d). However, the proportion of chromatin-bound class II.i increased in the mut line (Extended Data Fig. 8e), indicating an enhanced interaction between class II COOLAIR RNA and FLC chromatin. This was confirmed using chromatin isolation by RNA purification (ChIRP), which showed increased chromatin association of the class II COOLAIR across the FLC TSS region in the mut line (Fig. 4e,f). This 5′ ChIRP signal has previously been shown to be sensitive to proteinase K4. The mut lines produced lower levels of both unspliced and spliced FLC transcript (Fig. 4g and Extended Data Fig. 8f), and were consequently early flowering (Fig. 4h,i). A second mutant (mut-r) in which nucleotides were introduced to decrease the bulge and increase the H4–H6 helix behaved similarly to the wild-type transgene (Extended Data Fig. 7a–c).
Because the introduced mutations were close to the FLC TSS, they could potentially influence sense FLC transcription activity itself. We therefore introduced the same mutations into a transgene in which antisense COOLAIR expression had been disrupted by inserting a NOS terminator (TEX 2.0)3 (Fig. 4a). FLC transcript levels in mut-TEX were similar to those of wild-type TEX lines (WT-TEX) and higher than those of the mut lines (Fig. 4g and Extended Data Fig. 8f), supporting the requirement of COOLAIR in the flowering time changes induced by the mutations. The necessity of COOLAIR to be associated with the chromatin to effect these functional changes was tested by crossing a line carrying the mut transgene with the wild type. Analysis of the F1 plants enabled us to examine whether COOLAIR derived from the mut transgene influenced FLC expression of wild-type allele. We found that the FLC expression level in F1 lines was around 50% of that in the wild-type parental line (Extended Data Fig. 8g); therefore, the structural mutations function only on local FLC expression. In summary, increasing the bulges around the H4–H6 region promoted a COOLAIR–FLC chromatin association, reduced transcriptional output at the FLC locus and shortened the time to flower.
Given the complementarity of the H4–H6 region to the FLC TSS region, we reasoned that the conformation-dependent COOLAIR–FLC chromatin association might involve the direct binding of COOLAIR to FLC DNA. Potentially, COOLAIR could complement the FLC Watson strand to form a DNA–RNA duplex, although we have not found COOLAIR to form a significant R-loop at the 5′ end of FLC19. Alternatively, COOLAIR could bind to the double-stranded DNA (dsDNA) to form a DNA–RNA triplex20,21 (Extended Data Fig. 9a); the sequence content around the H4–H6 region (Fig. 4b,c) is capable of forming triplex structures with the dsDNA at the FLC TSS in vitro (Extended Data Fig. 9b). However, because of the proteinase K sensitivity4 of the ChIRP signal, we favour a model in which COOLAIR associates with a protein complex that binds close to the FLC TSS. FRI is central to establishing a local chromosomal environment at FLC22, so we tested the involvement of FRI in the functionality of COOLAIR conformation by analysing the structurally mutated transgene (mut) in both active FRI and null fri genotypes (Extended Data Fig. 8h). Structural mutations influence FLC expression in only the FRI genotype (Extended Data Fig. 8h). Therefore, in addition to the physical association of FRI with COOLAIR class II.ii in cold conditions, the structurally variable region of COOLAIR class II.i genetically interacts with FRI to regulate FLC expression in warm conditions. How the individual COOLAIR structural conformations of the different isoforms affect FLC transcription will be an exciting future area of investigation.
In summary, development of the single-molecule-based RNA structure profiling methodology has allowed us to directly determine the in vivo RNA structure of the antisense transcripts of COOLAIR. This methodology has enabled the structural conformations of each alternatively processed COOLAIR isoform to be described. In response to cold conditions, the proportion of COOLAIR adopting a certain conformation changes and new conformations emerge. Across the whole structural landscape of COOLAIR, we identified a structural element that showed the greatest conformational variation, which was complementary to the FLC TSS. We validated a functional role for this structural element in regulating COOLAIR–FLC chromatin association, FLC expression and flowering time, suggesting a functional role for RNA conformational changes in the environmental response of plants5,6,23–25. Our study provides insights into how lncRNA transcript isoforms can adopt different RNA structural conformations, and how these can functionally influence the association with chromatin and control transcription.
Methods
Statistics
No statistical methods were used to predetermine the sample size. The experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessment. Sampling in all cases was performed by collecting materials independently from separate plants.
Plant materials and growth conditions
The genotypes Col FRISF2 (Col FRI) and Var2–6 near-isogenic line have been described previously3,7. FLCWT, FLCWT-TEX, FLCmut, FLCmut-r, FLCmut-TEX and FLCmut-r-TEX were transgenic lines carrying an approximately 12 kb wild-type or mutated FLC genomic fragment. FLCmut was generated by introducing four-nucleotide mutations using site-directed mutagenesis. FLCWT-TEX and FLCmut-TEX were generated by inserting a NOS terminator fragment in the first exon of COOLAIR in the wild-type or mutated FLC genomic fragment, respectively3. FLCmut-r was generated by inserting a fragment (GAAATAAAGCGAGAACAAATGAAAACCCAGGT) complementary to the big bulge in the H4–H6 region using site-directed mutagenesis. Primers used for the construction are listed in Supplementary Table 1. The fragments were then cloned into SLJ77515 (ref. 26) and transformed into the Arabidopsis flc-2 FRI genotype3 with a floral-dipping method. Transgenic lines with a single insertion that segregated 3:1 for Basta resistance were identified in the T2 generation to generate homozygous T3 lines. T3 homozygous lines with FLCmut in flc-2 FRI background were crossed with Col FRI (WT) for F1 generation (Extended Data Fig. 8g) or with the flc-2 fri background for FLCmut fri (Extended Data Fig. 8h).
Seeds were surface-sterilized and sown on half-strength Murashige and Skoog medium. The plates were kept at 4 °C for 2–3 days. For warm-grown plants, seedlings were grown in warm conditions (16 h light, 8 h darkness with constant 20 °C) for 10 days. For the cold treatment, the plants were subjected to a two-week treatment at 5 °C (8 h light and 16 h dark conditions) after a 10-day pre-growth period in warm conditions.
(+)SHAPE and (−)SHAPE smStructure-seq library construction
We used the SHAPE reagent, NAI, to do the in vivo RNA secondary structure chemical probing. NAI was prepared as reported previously13. In brief, A. thaliana seedlings were completely covered in 20 ml 1× SHAPE reaction buffer (100 mM KCl, 40 mM HEPES (pH 7.5) and 0.5 mM MgCl2) in a 50-ml Falcon tube. NAI was added to a final concentration of 1 M and the tube swirled on a shaker (1,000 rpm). This high NAI concentration allows NAI to penetrate plant cells and modify the RNA in vivo. After quenching the reaction with freshly prepared dithiothreitol (DTT), the seedlings were washed with deionized water and immediately frozen with liquid nitrogen and ground into powder. Total RNA was extracted using the hot phenol method4, followed by DNase I treatment in accordance with the manufacturer’s protocol. The control group was prepared using DMSO (labelled as (−)SHAPE), following the same procedure as described above. Then, 2 µg (+)SHAPE or (−)SHAPE RNA samples was added to a 19-µl buffer system containing 2 µl 0.5 µM RNA–DNA hybrid adaptors (5′-rArGrArUrCrGrGrArArGrArGrCrArCrArCrGrUrCrUrGrArArCrUrCrCrArGrUrCrArC/3SpC3/ and 5′-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTN (N = equimolar A, T, G, C)), 4 µl 5× reaction buffer (2.25 M NaCl, 25 mM MgCl2, 100 mM Tris-HCl, pH 7.5), 2 µl 10× DTT (50 mM; made fresh or from frozen stock) and 1 µl TGIRT-III enzyme (10 µM; InGex). The reaction system was pre-incubated at room temperature for 30 min, then 1 µl of 25 mM dNTPs (an equimolar mixture of dATP, dCTP, dGTP and dTTP; at 25 mM each; RNA-grade) was added. The whole reaction system in the tube was incubated at 60 °C for 120 min. To remove the TGIRT-III enzyme from the template, 1 µl of 5 M NaOH was added and the sample incubated at 95 °C for 3 min. The sample was cooled down to room temperature and neutralized with 1 µl of 5 M HCl before the clean-up of the cDNAs with a MinElute Reaction Cleanup Kit (QIAGEN, 28204). To capture class I and class II COOLAIR isoforms along with 18S rRNA, PCR reactions with 10 cycles were done with specific primers (Supplementary Table 1) using KOD Xtreme Hot Start DNA Polymerase (Novagen). The amplified DNA fragments from the eight replicates of the PCR reactions were merged to obtain sufficient DNA. The resulting DNA samples were size-selected using the Solid Phase Reversible Immobilization size-selection system (BECKMAN COULTER). Two independent biological replicates were generated for both (+)SHAPE and (−)SHAPE smStructure-seq libraries. The purified DNA samples were subjected to PacBio library construction by BGI using a PacBio Sequel 3.0.
smStructure-seq data analysis of COOLAIR isoforms
The raw reads from (+)SHAPE and (−)SHAPE libraries were converted into HiFi reads (circular consensus sequences) using ‘ccs’ (https://github.com/PacificBiosciences/ccs) with parameters ‘--minPasses=3’ in order to achieve around 99.8% predicted accuracy (Q30)14. The HiFi reads were demultiplexed using the demultiplex barcoding algorithm Lima v.1.11.0 (https://github.com/pacificbiosciences/barcoding). The derived HiFi reads were mapped to both COOLAIR references and 18S rRNA (Supplementary Table 1) using BLASR (v.5.3.3)27 with parameters ‘--minMatch 10 -m 5 --hitPolicy leftmost’. Each read was converted into a ‘bit vector’. In brief, each bit vector corresponds to a single read and consists of series of zeroes (representing matches) and ones (mutations representing mismatches and unambiguously aligned deletions)11. To generate the overall SHAPE reactivity profiles, the mutation rate (MR) at a given nucleotide is simply the total number of ones divided by the total number of zeroes and ones at that location. Raw SHAPE reactivities of class II COOLAIR were then generated for each nucleotide using the following equation:
where (+)SHAPE corresponds to a NAI-treated sample and (−)SHAPE refers to a DMSO-treated sample. The true-negative rate, 1 − MR(−)SHAPE, represents the specificity at a specific location. The raw SHAPE reactivity (R) mathematically estimates the positive likelihood ratio of SHAPE modification. The raw SHAPE reactivity was normalized to a standard scale that spanned from 0 (no reactivity) to around 1 (high SHAPE reactivity)28 for showing the mutational profiles.
Structural analysis of class II COOLAIR isoforms by DaVinci
The whole pipeline of DaVinci is illustrated in Extended Data Fig. 2a. The bitvectors generated from previous step were transformed into constraint information (‘1’ representing single-stranded nucleotides) for each sequencing read of class II COOLAIR isoforms. The single-stranded constraints were incorporated into the SCFG engine of the DaVinci pipeline. The SCFG engine, including a set of transformation rules for SCFG and a probability distribution of the transformation rules for each non-terminal symbol, was provided by CONTRAfold29 with an extended function utility in CentroidFold30 (--engine CONTRAfold --sampling). The generated RNA structures with constraints derived from individual bitvectors were collected. Because the different structures can have the same mutational profile during probing, we used the sampling function with constraint of a bitvector to capture multiple structures of class II.ii COOLAIR isoforms. All of the collected RNA structures were transformed into dot-bracket strings followed by transformation into RNA structure elements using rnaConvert in the Forgi package31. The digitalized RNA secondary structure elements were extracted to create a numeric matrix and subjected to dimensionality reduction, such as PCA or multidimensional scaling. The dimensionality reduction results were clustered using k-means clustering with the k-means function from the scikit-learn Python package32. The value of k was set as determined visually. The representative structure for each cluster was identified by calculating the most common RNA structure type at each position (that is, the maximum expected accuracy) and was determined by the RNA structure that is at the centre of the cluster and most similar to the most common RNA structure. The base-pair probability was calculated by counting the frequency of all present base pairs in the conformation space. The positional base-pair probability was derived by , where Pij is the probability of base i of being base-paired with base j, over all its potential J pairing partners. The likelihood of single strandedness was calculated by the expression of 1 − Pi. In addition, the Shannon entropy was calculated as .
Structural analysis of HIV-1 RRE, RRE61, cspA and TenA
Probing data for HIV-1 RRE11 were obtained from RRE-invitroDMS_NL43rna.bam (https://codeocean.com/capsule/6175523/tree/v1). Probing data for the cspA 5′ untranslated region33 at 37 °C and 10 °C were obtained from Sequence Read Archive (accessions numbers SRR6123773 and SRR6123774). We performed the RNA structure probing experiments of in vitro folded HIV-1 RRE61 RNAs (3 pmol) containing the stem loops III, IV and V18 as described previously11. The TenA RNAs (3 pmol) were subjected to NAI chemical treatment13,34 in the presence or absence of 1 µM thiamine pyrophosphate (TPP). The NAI-modified RNA samples (TPP-treated and non TPP-treated RNAs) were mixed with a ratio of 20:80 (vol/vol) or 50:50 (vol/vol) for the library construction. All of the sequencing data were mapped to the respective references as described above. The subsequent bitvectors were generated and subjected to the DaVinci analysis described above, including the creation of the numeric matrix for the digitalized RNA structure elements, dimensionality reduction, k-mean determination and representative structure construction. In silico structural ensemble analysis of RRE wild-type and mutant RRE61 were performed by Boltzmann sampling (10,000 times) using RNAfold35. The subsequent analysis for the in silico structure ensemble is the same as for the DaVinci analysis but includes only the steps of creating the numeric matrix for the digitalized RNA structure elements, dimensionality reduction, k-mean determination and representative structure construction.
Total RNA extraction and RT–qPCR for gene expression analysis
Total RNA was extracted as previously described36. Genomic DNA was digested with TURBO DNA-free (Ambion Turbo DNase kit, AM1907) according to the manufacturer’s guidelines before reverse transcription was performed. Reverse transcription was performed with the SuperScript III Reverse Transcriptase (ThermoFisher, 18080093) following the manufacturer’s protocol using gene-specific primers. The standard reference gene UBC (At5g25760) for gene expression was used for normalization. All primers are listed in Supplementary Table 1.
Chromatin-bound RNA measurement assay
Chromatin-bound RNAs were extracted as previously outlined37. In brief, 2 g of warm-grown or cold-grown seedlings were ground into fine powder using mortar in liquid nitrogen. Then, 1% of the materials (about 200 mg fine powder) was used for total RNA extraction as described above. The nuclei from the remaining material were prepared with Honda buffer in the presence of 50 ng μl−1 tRNA, 20 U ml−1 RNase inhibitor (SUPERase-In; Life Technologies), and 1× cOmplete protease inhibitor (Roche). The nuclei pellet was resuspended in an equal volume of resuspension buffer (50% (vol/vol) glycerol, 0.5 mM EDTA, 1 mM DTT, 100 mM NaCl and 25 mM Tris-HCl pH 7.5) and washed twice with urea wash buffer (300 mM NaCl, 1 M urea, 0.5 mM EDTA, 1 mM DTT and 1% Tween-20 and 25 mM Tris-HCl pH 7.5). Two volumes of wash buffer were added to the resuspended nuclei and vortexed for 1 s. The chromatin was spun down and protein was removed using phenol–chloroform. RNAs from the supernatant were precipitated with isopropanol, dissolved and DNase-treated. The chromatin-bound RNAs were reverse-transcribed with the SuperScript III Reverse Transcriptase (ThermoFisher, 18080093) following the manufacturer’s protocol. A mixture of gene-specific primers (Supplementary Table 1) and EF1alpha (At5g60390.2)37,38, to estimate how many RNAs were bound to genome DNA (expressed as (chromatin-bound RNA)/EF1alpha), were included in the reverse-transcription reaction. The total RNAs were also reverse transcribed with the SuperScript III Reverse Transcriptase (ThermoFisher, 18080093) following the manufacturer’s protocol. A mixture of gene-specific primers (Supplementary Table 1) and PP2A (At1g13320) as a control were added to the reverse-transcription reaction, which estimates the total expression level of class II (expressed as (total RNA)/PP2A). The chromatin-binding ratio was calculated using the equation:
ChIRP–qPCR assay
ChIRP was performed as previously outlined, with some modifications4,39,40. Antisense DNA probes were designed against the distal exon sequence of COOLAIR class II and biotinylated at the 3′ end; probes are listed in Supplementary Table 1. Then, 3 g of warm-grown seedlings were crosslinked in 3% (vol/vol) formaldehyde at room temperature in a vacuum. Crosslinking was then quenched with 0.125 M glycine for 5 min. Crosslinked plants were ground into a fine powder and lysed in 50 ml of cell lysis buffer (20 mM Tris-HCl pH 7.5, 250 mM sucrose, 25% glycerol, 20 mM KCl, 2.5 mM MgCl2, 0.1% NP-40 and 5 mM DTT). The lysate was filtered through two layers of Miracloth (Merck, D00172956) and pelleted by centrifugation. The pellets were washed twice with 10 ml of nuclear wash buffer (20 mM Tris-HCl pH 7.5, 2.5 mM MgCl2, 25% glycerol, 0.3% Triton X-100 and 5 mM DTT). The nuclear pellet was then resuspended in nuclear lysis buffer (50 mM Tris-HCl pH 7.5, 10 mM EDTA, 1% SDS, 0.1 mM PMSF and 1 mM DTT) and sonicated using a Bioruptor ultrasonicator (Diagenode). All of the buffers were supplemented with 0.1 U μl−1 RNaseOUT (Life Technologies), 1 mM PMSF and Roche cOmplete tablets to keep the integrity of any RNA–protein and protein–protein complexes. The following steps were performed as previously described40. For each reaction, 30 μl pre-blocked Streptavidin C1 magnetic beads (Thermo Fisher Scientific, 65001) were used. Then, 20 μl of RNase A/T1 Mix (Thermo Fisher Scientific, EN0551) instead of RNaseOUT was added into the RNase+ reactions (Fig. 4e), just before the hybridization (at 37 °C for 4 h) started; these samples were used as the control for background noise. RNA was eluted and reverse transcribed using SuperScript IV Reverse Transcriptase (ThermoFisher, 18090050) with gene-specific primers. COOLAIR enrichment and DNA eluted was analysed by RT–qPCR. All primers used for reverse transcription and RT–qPCR are listed in Supplementary Table 1.
Electrophoretic mobility shift assays
Electrophoretic mobility shift assays (EMSAs) were performed as described previously21 using oligonucleotides end-labelled with Cy5 (DNA) or FAM (RNA). Oligonucleotide sequences are shown in Supplementary Table 1. EMSAs were done using home-made 15% polyacrylamide gels with 40 mM Tris-acetate (pH 7.4) and 10 mM MgCl2 at 15 volt cm−1. Gel images were taken with a Typhoon FLA 9500 fluorescence reader (GE Healthcare Life Sciences). Sequences for the positive control rDNA enhancer En3-PAPAS were obtained from a previous study21.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-022-05135-9.
Supplementary information
Acknowledgements
This work was funded by the European Research Council (grant 680324; to Y.D.), a Wellcome Senior Investigator (grant 210654; to C.D.), a Royal Society Professorship (RP\R1\180002; to C.D.), by the Biotechnology and Biological Sciences Research Council (BB/L025000/1; to Y.D.); and by Institute Strategic Programmes GRO (BB/J004588/1) and GEN (BB/P013511/1) to Y.D. and C.D.
Extended data figures and tables
Source data
Author contributions
M.Y., C.D. and Y.D. conceptualized the study. M.Y., P.Z., C.D. and Y.D. wrote the paper. Q.L., P.Z. and R.B. performed the SHAPE probing and RNA extraction. R.B. generated COOLAIR structural mutation constructs and transgenic plants. P.Z. performed the phenotypic analysis, gene-expression and genetic studies as well as the ChIRP assay of the structural mutants. M.Y. and Y.Z. constructed the RNA structure libraries. P.M. performed triplex EMSA experiments. M.Y. and J.C. analysed the sequencing data. C.D. and Y.D. acquired funding. C.D. and Y.D. conducted the project administration. C.D. and Y.D. supervised the study.
Peer review
Peer review information
Nature thanks Howard Chang, Chris Helliwell and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
Sequencing data have been deposited in the Sequence Read Archive (SRA) under BioProject ID number PRJNA749291. A full list of DNA oligomers, PCR primers and COOLAIR reference sequences is available in Supplementary Table 1. The raw data of RNA-expression level, RT–qPCR and ChIRP–qPCR that support the findings of this study are available as Source Data. Uncropped images of EMSA and RT–qPCR are available in Supplementary Fig. 1. Accession numbers (from The Arabidopsis Information Resource (TAIR; https://www.arabidopsis.org/)) for the genes analysed in this study are FLC (At5g10140) and COOLAIR (At5g01675). Standard reference genes EF1alpha (At5g60390), PP2A (At1g13320) and UBC (At5g25760) for gene expression were used for normalization. Source data are provided with this paper.
Code availability
Code is publicly available at GitHub (https://github.com/DingLab-RNAstructure/smStructure-seq).
Competing interests
A patent application (LU501541) naming Y.D., M.Y., J.C. and Y.Z. has been filed by the John Innes Centre for the technology described in this paper.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Minglei Yang, Pan Zhu
Change history
8/31/2022
In the version of this artile initially published, the first author listed in ref. 32 was incorrect and has now been amended in the HTML and PDF versions of the article.
Contributor Information
Caroline Dean, Email: caroline.dean@jic.ac.uk.
Yiliang Ding, Email: yiliang.ding@jic.ac.uk.
Extended data
is available for this paper at 10.1038/s41586-022-05135-9.
Supplementary information
The online version contains supplementary material available at 10.1038/s41586-022-05135-9.
References
- 1.Swiezewski S, Liu F, Magusin A, Dean C. Cold-induced silencing by long antisense transcripts of an Arabidopsis Polycomb target. Nature. 2009;462:799–802. doi: 10.1038/nature08618. [DOI] [PubMed] [Google Scholar]
- 2.Xu C, et al. R-loop resolution promotes co-transcriptional chromatin silencing. Nat. Commun. 2021;12:1790. doi: 10.1038/s41467-021-22083-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zhao Y, et al. Natural temperature fluctuations promote COOLAIR regulation of FLC. Genes Dev. 2021;35:888–898. doi: 10.1101/gad.348362.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Csorba T, Questa JI, Sun Q, Dean C. Antisense COOLAIR mediates the coordinated switching of chromatin states at FLC during vernalization. Proc. Natl Acad. Sci. USA. 2014;111:16160–16165. doi: 10.1073/pnas.1419030111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hawkes EJ, et al. COOLAIR antisense RNAs form evolutionarily conserved elaborate secondary structures. Cell Rep. 2016;16:3087–3096. doi: 10.1016/j.celrep.2016.08.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhu P, Lister C, Dean C. Cold-induced Arabidopsis FRIGIDA nuclear condensates for FLC repression. Nature. 2021;599:657–661. doi: 10.1038/s41586-021-04062-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Li P, Tao Z, Dean C. Phenotypic evolution through variation in splicing of the noncoding RNA COOLAIR. Genes Dev. 2015;29:696–701. doi: 10.1101/gad.258814.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yang X, Yang M, Deng H, Ding Y. New era of studying RNA secondary structure and its influence on gene regulation in plants. Front. Plant Sci. 2018;9:671. doi: 10.3389/fpls.2018.00671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Aw JGA, et al. Determination of isoform-specific RNA structure with nanopore long reads. Nat. Biotechnol. 2021;39:336–346. doi: 10.1038/s41587-020-0712-z. [DOI] [PubMed] [Google Scholar]
- 10.Morandi E, et al. Genome-scale deconvolution of RNA structure ensembles. Nat. Methods. 2021;18:249–252. doi: 10.1038/s41592-021-01075-w. [DOI] [PubMed] [Google Scholar]
- 11.Tomezsko PJ, et al. Determination of RNA structural diversity and its role in HIV-1 RNA splicing. Nature. 2020;582:438–442. doi: 10.1038/s41586-020-2253-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Yang H, Howard M, Dean C. Antagonistic roles for H3K36me3 and H3K27me3 in the cold-induced epigenetic switch at Arabidopsis FLC. Curr. Biol. 2014;24:1793–1797. doi: 10.1016/j.cub.2014.06.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Spitale RC, et al. RNA SHAPE analysis in living cells. Nat. Chem. Biol. 2013;9:18–20. doi: 10.1038/nchembio.1131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wenger AM, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 2019;37:1155–1162. doi: 10.1038/s41587-019-0217-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Cannone JJ, et al. The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinform. 2002;3:2. doi: 10.1186/1471-2105-3-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Mathews DH, Moss WN, Turner DH. Folding and finding RNA secondary structure. Cold Spring Harb. Perspect. Biol. 2010;2:a003665. doi: 10.1101/cshperspect.a003665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rouskin S, Zubradt M, Washietl S, Kellis M, Weissman JS. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature. 2014;505:701–705. doi: 10.1038/nature12894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Legiewicz M, et al. Resistance to RevM10 inhibition reflects a conformational switch in the HIV-1 Rev response element. Proc. Natl Acad. Sci. USA. 2008;105:14365–14370. doi: 10.1073/pnas.0804461105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sun Q, Csorba T, Skourti-Stathaki K, Proudfoot NJ, Dean C. R-loop stabilization represses antisense transcription at the Arabidopsis FLC locus. Science. 2013;340:619–621. doi: 10.1126/science.1234848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zhao Z, Sentürk N, Song C, Grummt I. lncRNA PAPAS tethered to the rDNA enhancer recruits hypophosphorylated CHD4/NuRD to repress rRNA synthesis at elevated temperatures. Genes Dev. 2018;32:836–848. doi: 10.1101/gad.311688.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Maldonado R, Filarsky M, Grummt I, Längst G. Purine- and pyrimidine-triple-helix-forming oligonucleotides recognize qualitatively different target sites at the ribosomal DNA locus. RNA. 2018;24:371–380. doi: 10.1261/rna.063800.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Li Z, Jiang D, He Y. FRIGIDA establishes a local chromosomal environment for FLOWERING LOCUS C mRNA production. Nat. Plants. 2018;4:836–846. doi: 10.1038/s41477-018-0250-6. [DOI] [PubMed] [Google Scholar]
- 23.Hepworth J, et al. Natural variation in autumn expression is the major adaptive determinant distinguishing Arabidopsis FLC haplotypes. eLife. 2020;9:e57671. doi: 10.7554/eLife.57671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Chung BYW, et al. An RNA thermoswitch regulates daytime growth in Arabidopsis. Nat. Plants. 2020;6:522–532. doi: 10.1038/s41477-020-0633-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Li W, et al. EIN2-directed translational regulation of ethylene signaling in Arabidopsis. Cell. 2015;163:670–683. doi: 10.1016/j.cell.2015.09.037. [DOI] [PubMed] [Google Scholar]
- 26.Jones JDG, et al. Effective vectors for transformation, expression of heterologous genes, and assaying transposon excision in transgenic plants. Transgenic Res. 1992;1:285–297. doi: 10.1007/BF02525170. [DOI] [PubMed] [Google Scholar]
- 27.Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinform. 2012;13:238. doi: 10.1186/1471-2105-13-238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Spitale RC, et al. Structural imprints in vivo decode RNA regulatory mechanisms. Nature. 2015;519:486–490. doi: 10.1038/nature14263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Do CB, Woods DA, Batzoglou S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics. 2006;22:e90–e98. doi: 10.1093/bioinformatics/btl246. [DOI] [PubMed] [Google Scholar]
- 30.Hamada M, Kiryu H, Sato K, Mituyama T, Asai K. Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics. 2009;25:465–473. doi: 10.1093/bioinformatics/btn601. [DOI] [PubMed] [Google Scholar]
- 31.Thiel BC, Beckmann IK, Kerpedjiev P, Hofacker IL. 3D based on 2D: calculating helix angles and stacking patterns using forgi 2.0, an RNA Python library centered on secondary structure elements. F1000Res. 2019;8:287. doi: 10.12688/f1000research.18458.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Pedregosa F, et al. Scikit-Learn: machine learning in Python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
- 33.Zhang Y, et al. A stress response that monitors and regulates mRNA structure is central to cold shock adaptation. Mol. Cell. 2018;70:274–286. doi: 10.1016/j.molcel.2018.02.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Smola MJ, Rice GM, Busan S, Siegfried NA, Weeks KM. Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis. Nat. Protoc. 2015;10:1643–1669. doi: 10.1038/nprot.2015.103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lorenz R, et al. ViennaRNA package 2.0. Algorithms Mol. Biol. 2011;6:26. doi: 10.1186/1748-7188-6-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Box MS, Coustham V, Dean C, Mylne JS. Protocol: a simple phenol-based method for 96-well extraction of high quality RNA from Arabidopsis. Plant Methods. 2011;7:7. doi: 10.1186/1746-4811-7-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wu Z, et al. Quantitative regulation of FLC via coordinated transcriptional initiation and elongation. Proc. Natl Acad. Sci. USA. 2015;113:218–223. doi: 10.1073/pnas.1518369112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Wu Z, et al. RNA binding proteins RZ-1B and RZ-1C play critical roles in regulating pre-mRNA splicing and gene expression during development in Arabidopsis. Plant Cell. 2016;28:55–73. doi: 10.1105/tpc.15.00949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zhu P, et al. Arabidopsis small nucleolar RNA monitors the efficient pre-rRNA processing during ribosome biogenesis. Proc. Natl Acad. Sci. USA. 2016;113:11967–11972. doi: 10.1073/pnas.1614852113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Chu, C., Quinn, J. & Chang, H. Y. Chromatin isolation by RNA purification (ChIRP). J. Vis. Exp.10.3791/3912 (2012). [DOI] [PMC free article] [PubMed]
- 41.Yang M, et al. Intact RNA structurome reveals mRNA structure-mediated regulation of miRNA cleavage in vivo. Nucleic Acids Res. 2020;48:8767–8781. doi: 10.1093/nar/gkaa577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Dowell RD, Eddy SR. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinform. 2004;5:71. doi: 10.1186/1471-2105-5-71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Jiang T, Wang L, Zhang K. Alignment of trees—an alternative to tree edit. Theor. Comput. Sci. 1995;143:137–148. doi: 10.1016/0304-3975(95)80029-9. [DOI] [Google Scholar]
- 44.Deigan KE, Li TW, Mathews DH, Weeks KM. Accurate SHAPE-directed RNA structure determination. Proc. Natl Acad. Sci. USA. 2009;106:97–102. doi: 10.1073/pnas.0806929106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Buske FA, Bauer DC, Mattick JS, Bailey TL. Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data. Genome Res. 2012;22:1372–1381. doi: 10.1101/gr.130237.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequencing data have been deposited in the Sequence Read Archive (SRA) under BioProject ID number PRJNA749291. A full list of DNA oligomers, PCR primers and COOLAIR reference sequences is available in Supplementary Table 1. The raw data of RNA-expression level, RT–qPCR and ChIRP–qPCR that support the findings of this study are available as Source Data. Uncropped images of EMSA and RT–qPCR are available in Supplementary Fig. 1. Accession numbers (from The Arabidopsis Information Resource (TAIR; https://www.arabidopsis.org/)) for the genes analysed in this study are FLC (At5g10140) and COOLAIR (At5g01675). Standard reference genes EF1alpha (At5g60390), PP2A (At1g13320) and UBC (At5g25760) for gene expression were used for normalization. Source data are provided with this paper.
Code is publicly available at GitHub (https://github.com/DingLab-RNAstructure/smStructure-seq).