ABSTRACT
In the last decade, the field of epitranscriptomics highlighted a wide array of post-transcriptional modifications in human RNAs, including microRNAs (miRNAs). Recent reports showed that human miRNAs undergo cytosine methylation. We describe the first high-throughput NGS-based method (BS-miRNA-seq) and an analysis pipeline (MAmBA) to attain high-resolution mapping of (hydroxy)-methyl-5-cytosine ((h)m5C) modifications in human miRNAs. Our method uncovers that miRNAs undergo widespread cytosine modification in various sequence contexts.
Furthermore, validation of our data with specific antibodies reveals both m5C and hm5C residues in human mature miRNAs. BS-miRNA-seq and MAmBA may contribute to the precise mapping of (h)m5C on miRNAs in various cell types and tissues, a key achievement towards the understanding of the functional implications of this modification in miRNAs. MAmBA is available for download at https://github.com/flcvlr/MAmBA
KEYWORDS: bisulphite-sequencing, Next Generation Sequencing (NGS), microRNA (miRNA), epitranscriptomics, 5-methyl-cytosine (m5C)
Introduction
Post-transcriptional (epitranscriptomic) modifications of RNA molecules represent an emerging layer of gene expression regulation [1,2]. Epitranscriptomic modifications include 1-methyl-Adenosine (m1A), 6-methyl-Adenosine (m6A), pseudouridine (Ψ), A-to-I editing and 5-methyl-Cytosine (m5C).
Recently, it has been shown that microRNAs (miRNAs) undergo epitranscriptomic modifications, such as 6-methyl-Adenosine[3], 3΄ uridylation[4] and A-to-I RNA editing[5]. It has been shown that 3ʹ uridylation affects miRNA stability[6] while A-to-I editing plays a key role in some types of cancers [7,8], highlighting the biological relevance of these epitranscriptomic modifications. Very recently, miRNAs have been shown to undergo m5C modification [9,10].
m5C was also identified in abundant noncoding RNAs (transfer RNAs, tRNA and ribosomal RNAs, rRNA[11]), and more recently also in less abundant RNAs, including mRNAs [12,13]. In these molecules, m5C modification is involved in the stabilization of the secondary structure[14], in the mRNA export to the cytoplasm and in post-transcriptional regulation[15], suggesting that m5C exerts a profound biological role in RNA functions. Interestingly, an overlap between AGO2 binding sites and m5C positions was reported in 3ʹ UTR of human mRNAs[12]. Notably, m5C may be converted in vivo into 5-hydroxymethylcytosine (hm5C) by the TET family enzymes, which were proven to be able to process DNA as well as RNA[16]. Although hm5C has not been extensively investigated in mammals, its deposition on mRNAs is associated with translation in Drosophila[17].
Currently, methods to identify m5C and hm5C residues in DNA or RNA molecules rely either on bisulphite treatment or on the use of specific antibodies able to capture either m5C or hm5C. While bisulphite-based methods, coupled with sequencing of treated DNA/RNA, allow a quantitative analysis of the methylation rates at single nucleotide resolution, these approaches do not discriminate between m5C and hm5C. On the other hand, methods based on antibodies allow specific detection of each modification, but do not provide quantitative results about the proportion of molecules harbouring the modification.
In a recent report, m5C modification of miRNAs has been highlighted by mass-spec approaches[9]. Furthermore, the deposition of m5C at CG dinucleotides by DNMT3A in mammalian miRNAs has been reported very recently[10].
In an attempt to characterize miRNA 5-(hydroxy)methyl-Cytosine ((h)m5C) in microRNAs, we found that, due to extensive fragmentation of abundant RNAs (tRNA,rRNAs), short reads mapping to miRNAs are extremely rare in existing bisulphite RNA-seq datasets, not allowing comprehensive profiling of these modifications in miRNAs. Indeed, even in the recent dataset by Cheray et al[10], the percentage of reads mapping to miRNAs is ~0.25%. We, therefore, set out to develop a dedicated high-throughput NGS strategy to systematically profile (h)m5C in miRNAs at single nucleotide resolution. We describe bisulphite miRNA sequencing (BS-miRNA-seq), a fast and cheap protocol and a dedicated analysis pipeline (Methylation Assessment of miRNA after Bisulphite Analysis, MAmBA) to quickly profile (h)m5C in mature miRNAs by NGS after bisulphite treatment.
We applied this protocol to investigate the (h)m5C profile of human cell lines as well as primary human Peripheral Blood Mononucleated Cells (PBMC). To our surprise, (h)m5C is a widespread modification in human miRNAs and is not restricted to CG sequence context, suggesting that, in addition to DNMT3A, other RNA methyltransferases might be involved. We validated our finding by miRNA immunoprecipitation with antibodies against both m5C and hm5C, thus showing inhere for the first time that human miRNAs harbour hm5C.
Materials and methods
Cell cultures
HeLaS3 and HEK293T cells were grown in DMEM medium supplemented with 10% (v/v) foetal bovine serum, 2 mM L-glutamine and penicillin-streptomycin at 37°C in 5% CO2.
PBMC were purified from buffy coats obtained through the local blood bank using Lympholyte-H (Cedarlane) according to manufacturer’s protocol.
Bisulphite treatment
For unmethylated C conversion, the EZ RNA MethylationTM Kit (Zymo Research) was used following manufacturer instructions (input RNA between 0.5–1 μg).
RNA isolation and PAGE purification of small RNAs
Total RNA was isolated from cell cultures or PBMCs using Direct-zolTM RNA MiniPrep Kit (Zymo Research) according to manufacturer’s protocol.
AGO1- and AGO2-associated RNAs were purified by immunoprecipitation as previously described[18]. Total RNA (100–200 μg per sample) was size-fractionated on a 15 × 15 cm 15% denaturing polyacrylamide gel [0,5X TBE, 8 M urea, 15% acrylamide (29:1 acryl:bis-acryl)], which was run in 0.5X TBE buffer at 25 mA constant voltage until the bromophenol blue dye front migrates until 1–2 cm before the gel bottom. A gel slice corresponding to small RNAs of 15–35 nt was excised from the gel. Excised band was defined by the mobility of smallRNAs size markers (ZR small-RNATM Zymo Research). The small RNAs were eluted by incubation of the crushed gel slice in 10 ml (RNase-free) 1 M NaCl solution overnight at 4°C with rocking. Precipitation of eluted small RNA was carried out by addition of 10 μg glycogen and 0.8 volume of isopropanol. Pellet was resuspended in RNase-free water. The amount of small RNAs recovered generally corresponds to 1% of the total RNA loaded.
RNA immunoprecipitation and RT–qPCR
For Methylated RNA ImmunoPrecipitation (Me-RIP), 500 ng of the small RNA fraction of HeLa S3 cells were incubated with 2.5 μg of anti-m5C (clone 32E2) or anti-5 hmC (Active Motif) antibodies as well as isotype-matched immunoglobulin in IP buffer (10 mM Tris-HCl pH7.4, 150 mM NaCl, 0.1% Igepal CA-630, RNasin 400 U/ml, Heparin 200 μg/ml, 1X protease inhibitor cocktail (Sigma)) overnight at 4°C. Samples were then incubated at 4°C for 2.5 h with 15-μl equilibrated Protein A MagBeads (GenScript) or SureBeads Protein G Mag beads (BioRad) for m5C or hm5C immunoprecipitation, respectively. IP was washed three times for 5 min with 1 ml IP buffer. 0.2 ml TriFast (EuroClone) was added to each IP sample for RNA extraction.
Reverse transcription was performed from 50 ng of small RNA fraction as input or 1/4 of Me-RIP samples, using SuperScript II Reverse Transcriptase (Thermo Fisher Scientific), following manufacturer’s protocol. Specific primers were used for miRNA reverse transcription. qPCR was performed with GoTaq qPCR Master Mix (Promega), and primers are listed in Supplementary Table 1. Quantification was normalized to mature miRNA hsa-let-7a, whose sequence does not contain any C residue, and calculated using the 2-ΔΔct method. P-values were calculated using Wilcoxon/Mann–Whitney paired one-tailed test.
Table 1.
NGS libraries characteristics. Features of all libraries analysed in this work are reported
| SRA | Source | Purification method | Reads (millions) | % reads aligned to: |
independent biological replicates | C to U conversion efficiency tRNAGly(GCC) fragment (%) | ||
|---|---|---|---|---|---|---|---|---|
| tRNA/rRNA | gencode.v31 | miRNA | ||||||
| SRR11931321 | HeLaS3 | total RNA | 11.5 | 70.9 | 17.5 | 0.15 | 1/1 | 98.8 |
| SRR11931320 | HeLaS3 | AGO1 RIP | 15.5 | 49.1 | 31.9 | 6.39 | 1/1 | 98.9 |
| SRR12691228 | HeLaS3 | AGO2 RIP | 36.7 | 46.8 | 36.7 | 1.74 | 1/2 | 99.9 |
| SRR11931316 | HeLaS3 | AGO2 RIP | 45.2 | 28.0 | 25.6 | 1.23 | 2/2 | 98.7 |
| SRR11931314 | HEK293T | AGO2 RIP | 18.8 | 31.6 | 35.3 | 8.34 | 1/1 | 99.3 |
| SRR11931312 | HEK293T | PAGE | 26.6 | 69.6 | 6.7 | 0.21 | 1/2 | 99.8 |
| SRR11931311 | HEK293T | PAGE | 22.4 | 60.0 | 6.6 | 0.76 | 2/2 | 99.6 |
| SRR11931310 | HeLaS3 | PAGE | 9.2 | 40.9 | 30.9 | 6.80 | 1/3 | 99.4 |
| SRR11931319 | HeLaS3 | PAGE | 14.4 | 34.3 | 13.9 | 3.29 | 2/3 | 99.8 |
| SRR11931318 | HeLaS3 | PAGE | 12.9 | 57.5 | 14.8 | 1.91 | 3/3 | 99.6 |
| N/A | PBMC | PAGE | 17.1 | 7.1 | 85.6 | 34.89 | 1/1 | 99.3 |
Next generation sequencing
All libraries were prepared and sequenced at Institute for Applied Genomics (IGA) Udine, (Italy) using ILLUMINA technology. For miRNA sequencing, following bisulphite treatment 200 ng of converted small RNA were sent to IGA .
Data analysis
Fastq files were processed with cutadapt[19] to remove adapters and terminal Ns using options ‘-m 18 – trim-n -a ADAPTORSEQ’. The overall strategy for methylation assessment is largely inspired by the one used by Bismark[20] software developed to assess C methylation on DNA. However, several important differences exist, mainly due to the fact that our algorithm is focused on strand-specific alignment to a transcriptome while Bismark was developed to align unstranded reads on genomes.
All C in original reads were converted into T before alignment. Alignments were performed on C to T-converted reference sequences in two sequential rounds:
I) reads were aligned using bowtie2 to an index obtained using human rRNA and tRNA sequences (retrieved from RNAcentral [21–24]), searching only for alignments to the forward strand (i.e. the transcript strand) with the following options: ‘-N 0 -L 18 – score-min L,0,-0.2 – norc’
II) sequences without any valid alignment in step I were aligned using tophat2 to the human transcriptome (gencode v31, hg38 genome build), also in this case only valid alignments to the transcript strand were taken into account. The following options were used: “–b2-N 0 – b2-L 18 – library-type fr-secondstrand – no-sort-bam -T/–transcriptome-only “
A custom perl script was used to compare the genomic reference sequence of primary alignments with that of the corresponding read in the original fastq file (before C to T conversion). The script summarizes the number of bisulphite-converted C at single nucleotide resolution by writing a single BED formatted line for each non-converted C in a read on an output stream and, on a different output stream, a single BED formatted line for each C (converted or not) overlapped by a read. No assumptions were made about the sequence context of C residues. Non-conversion frequency at single nucleotide resolution for each pre-miRNA (retrieved from miRBase v22) was obtained using bedtools[25]. First, coverage and non-converted C information was summarized using bedtools merge (command line options: ‘-d − 1 -s -c 6,4,5 -o first,count_distinct,sum’) on each of the two summary files, afterwords, the two files were intersected using bedtools intersect (command line options: ‘-sorted -s -wao’). To retrieve data on C residues within pre-miRNA hairpins, bedtools intersect was run (command line options: ‘-wo -s -sorted’) and the output parsed with awk to yield data reported in Supplementary Tables 6–10 (with coordinates relative to pre-miRNA).
Data were plotted using R custom scripts. Adjusted P-values for non-conversion frequency at each position were obtained using the bisRNA R package[26]. GNU Parallel[27] was used to allow parallel processing at some steps of MAmBA.
To investigate association between (h)m5C residues and specific sequences, a region of 10 nucleotides centred on the modified C was submitted to both MEME Suite[28] (command line options:” -rna -oc . -nostatus -time 14,400 -mod zoops -nmotifs 5 -minw 4 -maxw 10 -objfun classic -markov_order 1”) and WebLogo[29] tools. The MEME model cannot accommodate overlapping sequences, therefore for those miRNAs in which multiple C residues were methylated within the same pre-miRNA and attained statistically significant methylation, only the 21 nt region centred on the C residue with lowest P-value was included in the analysis.
Conversion efficiency was confirmed on endogenous 28S and 18S rRNAs recovered in the NGS reads. Furthermore, we also took advantage of an abundant tRNA fragment corresponding to the 5ʹ half of tRNAGly(GCC), which co-purifies with miRNAs.
All scripts and software are available for download from Github: https://github.com/flcvlr/MAmBA.
Results
Establishment of Bisulphite microRNA sequencing (BS-miRNA-seq)
Prompted by the finding that miRNAs may be modified by 5-methylation of cytosine [9,10], we set out to further investigate this issue by developing a dedicated protocol of bisulphite sequencing allowing profiling of (h)m5C at single nucleotide resolution in the miRNA fraction of human cell lines as well as of primary cells. Bisulphite sequencing relies on the conversion of unmodified C residues into U during bisulphite treatment. Analysis of non-converted C residues in the sequencing results allows an estimation of the methylation profile. In line with recent literature, we will refer to non-converted C residues when discussing the outcome of sequencing and use the term ‘methylated’ only to refer to those positions at which the number of non-conversion events attains statistical significance (adjusted P-value < 0.05), implying that non-conversion is due to C modification.
NGS datasets of small (<200 nt) RNAs treated with bisulphite (PRJNA292863, PRJNA189560) are yet available and were obtained aiming at characterization of m5C in murine tRNAs [30,31]. However, these datasets are not suitable for miRNA analysis. Since bisulphite treatment causes rapid degradation of RNA[11], most reads corresponding to small (20 to 30 nt) RNAs arise from tRNA fragmentation, resulting in very poor recovery of miRNA reads, ranging from 0.0001% to 0.0345% (Supplementary Table 2).
Table 2.
Detection of (h)m5C by BS-miRNA-seq in miRNAs by different purification methods. For each biological sample, the number of mature microRNAs detected with at least 30 reads and the number miRNA harbouring a putative (h)m5C site is reported along with the number of independent biological replicates that were merged for the analysis
| Source | Purification method | Merged replicates | Number of mature miRNAsanalysed (> 30 reads) | Number of miRNAs harbouring at least one non-converted C residue (adj p-value < 0.05) |
|---|---|---|---|---|
| HeLaS3 | total RNA | 1 | 35 | 12 |
| HeLaS3 | AGO1 RIP | 1 | 193 | 93 |
| HeLaS3 | AGO2 RIP | 2 | 218 | 103 |
| HEK293T | AGO2 RIP | 1 | 233 | 83 |
| HEK293T | PAGE | 2 | 122 | 48 |
| HeLaS3 | PAGE | 3 | 202 | 85 |
| PBMC | PAGE | 1 | 285 | 130 |
At first, we attempted to identify (h)m5C in miRNAs by treating total RNA from HelaS3 cells with bisulphite and subsequently sequencing by NGS the miRNAs fraction (15–40 nt) rather than all RNAs with a length less than 200 nt. This approach results in an about 10-fold increase of the fraction of reads mapping to miRNAs (0.15%). However, this still represents a minor fraction of mapped reads, and most miRNAs did not attain sufficient coverage to allow a rigorous analysis of non-converted cytosines (Table 1). In fact, due to the extensive fragmentation of large RNA molecules occurring during bisulphite treatment, we still mainly recovered and sequenced tRNA and rRNA fragments rather than miRNAs.
In order to improve miRNA recovery, we performed the isolation of AGO1-bound small-RNAs by AGO1 RNA Immunoprecipitation (RIP), followed by bisulphite treatment and NGS. Notably, bisulphite treatment of the immunopurified small RNA fraction results in a much more reliable result, as fragments arising from larger RNA molecules are significantly reduced, and miRNAs account for a larger part (6.4%) of the recovered reads (Table 1). These experimental settings allowed us to achieve sufficient sequencing depth to reliably characterize non-conversion rates in miRNA. We set an arbitrary threshold of at least 30 reads to include a mature miRNA in our analysis. 193 mature miRNAs attained this requirement and were further analysed for identification of significantly non-converted (bona fide methylated) cytosines. This strategy was further validated by looking at the (h)m5C profile of small RNAs selected by AGO2 RIP in both HeLaS3 and HEK293T cells (Table 1).
To improve the assessment of the global profile of (h)m5C in miRNAs and avoid any bias for AGO-bound miRNAs, we set out to specifically isolate mature miRNAs (15 to 35 nt long RNAs) from total RNA by PAGE purification before bisulphite treatment and NGS. We called this method ‘Bisulphite microRNA-sequencing’ (BS-miRNA-seq).
BS-miRNA-seq (Fig. 1A) resulted in an about 40-fold increase in the recovery of reads mapping to mature miRNAs after NGS of bisulphite treated small RNAs (Table 1).
Figure 1.

(A) Schematic representation of the experimental workflow for BS-miRNA-seq. (B) Flowchart of MAmBA software
We also analysed small RNAs extracted from total RNA obtained from Peripheral Blood Mononucleated Cells (PBMCs) purified from one healthy donor. These data provide a first proof of principle that (h)m5C deposition occurs and is detectable with our protocol also in primary human cells (Table 2).
Methylation Assessment of miRNAs after Bisulphite Analysis (MAmBA): a bioinformatic tool to profile (h)m5C in microRNAs
We set up a bioinformatic pipeline to analyse (h)m5C on miRNAs following bisulphite treatment. MAmBA processes each fastq.gz file by performing the following steps (Fig. 1B):
Adapter trimming (cutadapt[19]);
Bisulphite conversion efficiency assessment using 18S, 28S, or abundant non-miRNA small RNAs in the sample (e.g. tRNA fragments) or custom synthetic RNA spike-in (optional);
Alignment of reads to tRNA and rRNA database (RNACentral[22]) to filter out tRNA and rRNA fragments with bowtie2;
Alignment to the entire transcriptome with tophat2;
Computation of coverage and non-conversion frequency at all C positions with bedtools[25];
Summarizing results for C residues lying within miRNA precursors;
Generation of a methylation report for each pre-miRNA highlighting mature miRNAs positions and overall coverage at single nucleotide resolution, including significantly methylated positions (with bisRNA package).
Steps 3 and 4 are performed using a copy of the original fastq file in which all C have been converted into T; the same C to T conversion was performed for the relevant reference fasta files before building bowtie2 indexes. This approach is largely inspired by the one used in Bismark[20]; however, all alignments were performed taking into account that ILLUMINA small RNA-sequencing protocols are stranded (second-strand) and hence matches to the reverse complement of the transcriptome were not taken into account. Tophat2 alignment was limited to the gencode v31 transcriptome, without attempting to align directly to the genome reads without matches on the transcriptome. By assuming that all reads arise from RNA and are stranded, the complexity of the reference can be greatly reduced, minimizing multiple mappings.
An intrinsic limitation in BS-Seq is represented by the alignment step, as the overall conversion of C residues into U may result in illegitimate mapping. We chose a very conservative approach, therefore MAmBA only takes into account those reads longer than 17 nt, and yielding a single valid alignment on the transcriptome. In the human genome, 1855 out of 2883 mature miRNAs reported in miRbase[32] meet these conditions. The remaining 1028 canonical mature miRNAs are either shorter than 18 nt (66 miRNAs), or have alignments to tRNA/rRNA (32 miRNAs) or have multiple matches on the transcriptome (930 miRNAs). In this latter case, we observed that even discarding all reads with multiple matches, a methylation profile can still be obtained thanks to iso-miR reads, which extend into the pre-miRNA for 1 or 2 nucleotides and in many instances, this is sufficient to attain a single valid alignment. The percentage of reads aligning to miRNAs reported in Table 1 for each library only takes into account uniquely mapped reads, which contribute to the (h)m5C profile reported.
Following assessment of converted and non-converted C residues at each position on pre-miRNAs, we took advantage of the bisRNA R package[26] to compute adjusted p-values for each putatively (h)m5C modified site.
MAmBA is available together with a MAmBA reference for human (hg38) and mouse (mm10) genomes, and a tool to generate custom MAmBA reference for any species, provided that a genome assembly (along with a matched bowtie2 index), tRNA and rRNA fasta file, a transcriptome annotation GTF file (including miRNA transcripts), and miRBase annotation are available. Generation of reference on a 8-core Desktop PC for a large genome (i.e. hg38) takes about half an hour with minimal memory footprint.
Conversion efficiency
A key point in bisulphite RNA-seq analysis is assessing C to U conversion efficiency attained through bisulphite treatment. Endogenous transcripts that may be recovered in the small RNA library may allow estimation of C to U conversion efficiency. We, therefore, checked C to U conversion efficiency on an endogenous small RNA corresponding to a fragment of 31 nt arising from the 5ʹ end of tRNAGly(GCC) herein referred to as 5-tRF-gly(GCC). We chose this molecule as it has a similar size compared to miRNAs and hence is efficiently recovered in our protocol, making 5-tRF-gly(GCC) an ideal control for complete C to U conversion. Conversion efficiency as estimated on 5-tRF-gly(GCC) ranges from 98.7% to 99.9% with an average of 99.4% (Table 1).
An alternative approach to estimate C to U conversion efficiency relies on 18S and 28S rRNAs. In fact, we have observed that reads mapping on both 18S and 28S rRNA can be recovered in our libraries, presumably due to the great abundance of these transcripts. Therefore, we further checked conversion efficiency on 18S and 28S rRNAs (Supplementary Table 3). The results we obtained are in line with the ones reported in other bisulphite RNA-seq studies [26,33].
Bisulphite miRNA-seq suggests that (h)m5C is a widespread RNA modification in human microRNAs
We considered putatively (h)m5C modified those mature miRNAs in which at least one C residue attained an adjusted P-value < 0.05. In most samples analysed, about 40% of the mature miRNAs are putatively methylated at one or more C residues (Table 2).
Our pipeline yielded a profile of putative (h)m5C modified residues in human miRNAs with a single-nucleotide resolution (as reported in detail in Supplementary Tables 4–10). Representative non-conversion events are reported for four different miRNAs in Fig. 2 for both HeLaS3 (Fig. 2A) and HEK293T (Fig. 2B) cells. A set of representative miRNAs for the PBMC sample is reported in Figure S1.
Figure 2.

Non-converted cytosine profile in HeLaS3 (left) and HEK293T (right) cell lines for four representative putatively modified miRNAs. For each cell line, the profile obtained from PAGE purified small RNAs (green, HeLaS3; red, HEK293T) is reported. Asterisks indicate positions where significantly not-converted C frequency was observed (adj P-value <0.01; HeLaS3: n = 3; HEK293T: n = 2)
While some miRNAs exhibit a very similar profile in both cell lines (Fig. 2, hsa-miR-10a, hsa-miR-17, hsa-miR-186), other have a cell line-specific non-conversion pattern (e.g. hsa-miR-339). Of note, we observed miRNAs displaying consistently non-converted C residues for their entire length (Fig. 2, hsa-miR-10a, Fig S2, hsa-miR-125a), but also miRNAs displaying significant non-conversion only at a single C residues (hsa-miR-17, hsa-miR-339) and miRNAs with different levels of non-conversion at different C residues (Fig. 2, hsa-miR-186;Fig S1 hsa-miR-345; Fig S2 hsa-miR-25).
Sequence context of (h)m5C in human miRNAs
We asked whether (h)m5C in human miRNA is associated with specific sequence contexts. However, exploratory analyses with MEME Suite[28] and WebLogo[29] suggest that neighbouring residues do not affect (h)m5C modification. In fact, we found methylated C residues in very different sequence contexts (Fig. 3) and we could not identify any consensus sequence significantly associated with the (h)m5C. Based on our data, miRNA non-converted C residues are not flanked by the NGGG motif recently reported for NSUN2-dependent m5C sites[34]. These results hold true even when focusing on the highly (> 50% (h)m5C) methylated C residues. This finding is in agreement with a previous report, which highlighted that in general m5C in mammalian RNA is not restricted to CpG dinucleotides[12].
Figure 3.

Weblogo output for sequences surrounding putative (h)m5C residues in miRNA. For each sample, a fasta file containing intervals of 21 nucleotides surrounding significantly non-converted C residues within mature miRNAs were submitted to WebLogo
We also considered the miRNA ‘seed’ sequences (i.e. positions 2–8 of mature miRNAs) that play a key role in guiding RISC recruitment on target mRNAs by base-pairing. We asked whether our data supported any bias towards an enrichment of m5C in this region of miRNAs. Of note, although we observed several non-converted C residues in the seeds of some miRNAs (Fig. 2; Fig S1; Fig S2), we do not observe a significant enrichment of non-converted C residues in miRNA seeds.
Methylated RNA Immunoprecipitation (MeRIP) analysis suggests that in human miRNAs m5C is oxidized to hm5C
Bisulphite sequencing is unable to discriminate between hm5C and m5C. We, therefore, set out to confirm (h)m5C deposition in miRNA by RNA immunoprecipitation with specific antibodies raised either against m5C or against hm5C (meRIP). In agreement with a recent report highlighting that antibodies raised against 5mC in DNA poorly bind m5C in RNA context, we used the recently developed 32E2 monoclonal antibody for m5C IP[35]. Due to lack of antibodies specifically raised against hm5C in RNA, we resorted to an antibody recognizing 5hmC in a DNA context.
We chose to validate in HeLaS3 cells (h)m5C of hsa-miR-125a-5p, hsa-miR-191-5p, and hsa-miR-25-3p. As negative controls, we analysed an unmethylated (according to bisulphite NGS output) mature miRNA (hsa-miR-21-5p), and a mature miRNA that does not contain any C residue in its sequence (hsa-let-7 f-5p). Data were normalized using hsa-let-7a-5p, which also does not contain any C residue within its sequence. Detailed methylation profiles of these six miRNAs are reported in Supplementary Figure S2.
We performed me-RIP from small (15 ~ 35 nt) RNAs fraction of HeLaS3 cells. As depicted in Fig. 4, we found a specific enrichment in both m5C- and hm5C-IP for miR-25-3p, miR-125a-5p and miR-191-5p as compared to mock IP (IgG), as assessed by RT–qPCR. As expected, none of our negative controls (miR-21-5p, let-7 f-5p) was enriched in either m5C or hm5C-IP as compared to mock IP.
Figure 4.

MeRIP validation of BS-miRNA Seq. Mature miRNA fraction purified from HeLaS3 cells was immunoprecipitated using specific antibodies raised against m5C (A) or hm5C (B). Immuno-purified mature miRNA abundance was measured by qRT-PCR using specific primers. The enrichment of different miRNAs in the m5C- or 5hmC- IPed sample as compared to IgG was plotted. (in A n = 6; in B n = 5, except for miR-21-5p (n = 4). Error bars represent SEM)
Although the number of miRNAs tested is relatively small, our data clearly show that a subset of C residues in human miRNAs is methylated into m5C and further oxidized to hm5C. To the best of our knowledge the presence of hm5C in microRNAs had never been explicitly tested, thus far.
DISCUSSION
We describe inhere BS-miRNA-seq, a fast, cheap and high-throughput protocol to profile (h)m5C in mature human miRNAs and MAmBA, a pipeline to analyse BS-miRNA-seq output. To the best of our knowledge, this is the first comprehensive strategy to specifically assess miRNA (h)m5C after bisulphite treatment. (h)m5C in human miRNAs has been only recently reported and is a potentially very relevant modification which might affect miRNA function. Indeed, Cheray et al[10] reported that, in a CpG context, m5C inhibits miRNA activity in human cells.
BS-miRNA-seq is based on bisulphite treatment of the small RNA fraction purified from cell lines and/or primary cells, followed by NGS. This ploy results in a 40-fold improvement in the sequencing depth of miRNAs, compared to miRNAs libraries obtained from bisulphite treated total RNA (Table 1). This finding is corroborated by a comparison of our results with the ones obtained in the pioneering work by Cheray et al[10] who reported three libraries with a percentage of reads mapping to miRNAs of 0.31%, 0.20% and 0.24%. In our PAGE-purified datasets, the percentage of reads mapping to miRNAs is 7.98% on average, with an about 40-fold increase. Notably, purification of small RNAs (15 to 35 nt) prior to bisulphite treatment also minimizes the effects of possible RNA secondary structures that might lead to false positives due to incomplete C to U conversion in highly structured RNA regions. Indeed, coverage plots (Fig. 2, Fig S1 and Fig S2) clearly show that the reads we recovered correspond to ~21 nt mature miRNAs rather than longer miRNA precursors. This rules out possible artefacts due to strong secondary structures of pre-miRNAs. Of note, in most cases, we observed non-conversion rates which highly differ between C residues belonging to the same mature miRNA; since secondary structures significantly shorter than 20 nt are not stable at the bisulphite treatment incubation temperature (54°C), this argues against the possibility that non-conversion rates we observe are due to miRNA secondary structures. Furthermore, PAGE purification of 15 to 35 nt small RNAs prior to bisulphite sequencing prevents possible artefacts due to protein binding to RNAs, which have been proposed to result into artifactual incomplete conversion in bisulphite experiments[36].
MAmBA analyzes the sequencing output making no a priori assumption on the sequence context of (h)m5C. Our data in fact unveil that (h)m5C is not only found in a CpG context in human mature miRNAs, but is rather a widespread modification whose deposition is not generally associated with any specific sequence motifs in the cell types we analysed (Fig. 3). This finding is in agreement with previous reports on (h)m5C in mammalian RNAs, which has never been found to be associated with CpG[12]. Our data also suggest that further RNA methyltransferases might contribute to (h)m5C modification in addition to the reported role played in this process by DNMT3A. Of note, NSUN2 was previously reported to be responsible for the m5C modification of vaultRNAs, thus affecting their processing into small RNAs, which are then loaded onto AGO proteins[37]. Furthermore, NSUN2 is also a candidate writer enzyme for m6A deposition on miR-125b[38], confirming that this enzyme is able to bind and process miRNA molecules. These data suggest that NSUN2 deserves further investigation as a strong candidate writer enzyme of m5C in miRNAs. However, we did not observe the specific consensus sequence reported recently in NSUN2 modified mRNAs[34]. This might reflect that NSUN2 is responsible for modification of a subset of the observed non-converted C residues.
MAmBA allows reliable m5C calling with a relatively shallow (10–20 M reads) sequencing per sample. Methylation calling is affected by several factors including miRNA expression level, methylation frequency, overall sequencing depth and bisulphite conversion efficiency. Based on our experimental settings (10–20 M reads/sample, 99% bisulphite conversion efficiency), we estimate that a coverage > 100 reads is sufficient to significantly call a 5% methylation frequency.
BS-miRNA-seq and MAmBA allowed us to identify a large number of (h)m5C residues in miRNAs in commonly used human cell lines as well as in primary samples. Konno and colleagues[9] suggested that m5C might promote miRNA-AGO interaction. Furthermore, a significant overlap between mRNAs m5C and Argonaute protein-binding sites has been reported[12], although the actual biological significance of such overlap is debated[13]. Notably, m5C has been proposed to increase RNA secondary structure stability[14]; hence, this modification might promote mRNA:miRNA interaction.
BS-miRNA-seq, being based on bisulphite treatment, is not able to discriminate between hm5C and m5C. A high-throughput characterization of hm5C vs m5C modification in human miRNAs is, therefore, beyond the scopes of our method. However, our meRIP validation highlights that both m5C and hm5C modifications are present in miRNAs in human cells. The different enrichments observed for m5C and hm5C do not necessarily reflect the actual abundances of these two modifications in miRNAs, and might, at least in part, be ascribed to a different affinity for the epitope by the two antibodies. Further investigation will allow to dissect the relative abundance of m5C and (h)m5C in human miRNAs, possibly relying on dedicated techniques as well as the methyltransferases and methylcytosine dioxygenases involved in the deposition of these modifications in miRNAs.
Supplementary Material
Acknowledgments
32E2 monoclonal antibody was a generous gift by Prof Gunter Meister (Regensburg Center for Biochemistry, Laboratory for RNA Biology, University of Regensburg).
Funding Statement
This work was supported by EPIGEN (Flagship Epigenome project MIUR CNR) to VF and Sapienza University of Rome grant RM11816429252FCF to VF. VDP is supported by a PRIN 2017 scholarship. EL is supported by a PhD scholarship by Ministero Italiano per l’ Università e la Ricerca (MIUR).
Disclosure of potential conflicts of interest
No potential conflict of interest was reported by the author(s).
Author contributions
VF, IL, CC conceived the study. CC, IL, EL and VDP performed experiments. VF conceived MAmBA software. EL and VF analysed data. GA purified PBMC from healthy donor.
Availability
MAmBA with detailed instructions for use can be downloaded at https://github.com/flcvlr/MAmBA
Annotation to run MAmBA on Homo sapiens and Mus musculus datasets, as well as example datasets for installation testing are available at https://sites.google.com/a/uniroma1.it/valeriofulci-eng/software
Accession numbers
HeLaS3 and HEK293T NGS libraries have been submitted to SRA (accession ID: PRJNA637504). Reviewers can access data at this link: https://dataview.ncbi.nlm.nih.gov/object/PRJNA637504?reviewer=6t69e78oofvt2uepm0ltd98etd
In order to preserve privacy of healthy donors, PBMC NGS data were not submitted to a public repository. These data will be made available upon reasonable request.
Supplementary Material
Supplemental data for this article can be accessed here.
References
- [1].Wv G, Ta B, Schaening C.. Messenger RNA modifications: form, distribution, and function. Science. 2016;352(1):1408–1412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Zhao BS, Roundtree IA, He C.. Post-transcriptional gene regulation by mRNA modifications. Nat Rev Mol Cell Biol. 2017;18(1):31–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Erson-Bensan AE, Begik O. m6A Modification and Implications for microRNAs. MicroRNA. 2017;6:97–101. [DOI] [PubMed] [Google Scholar]
- [4].Jones MR, Quinton LJ, Blahna MT, et al. Zcchc11-dependent uridylation of microRNA directs cytokine expression. Nat Cell Biol. 2009;11(9):1157–1163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Blow MJ, Grocock RJ, Van Dongen S, et al. RNA editing of human microRNAs. Genome Biol. 2006;7(4):R27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Gutiérrez-Vázquez C, Enright AJ, Rodríguez-Galán A, et al. 3ʹ Uridylation controls mature microRNA turnover during CD4 T-cell activation. RNA. 2017;23(6):882–891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Cesarini V, Silvestris DA, Tassinari V, et al. ADAR2/miR-589-3p axis controls glioblastoma cell migration/invasion. Nucleic Acids Res. 2018;46(4):2045–2059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Wang Y, Liang H. When MicroRNAs Meet RNA Editing in Cancer: a Nucleotide Change Can Make a Difference. BioEssays. 2018;40(2). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Konno M, Koseki J, Asai A, et al. Distinct methylation levels of mature microRNAs in gastrointestinal cancers. Nat Commun. 2019;10(1):3888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Cheray M, Etcheverry A, Jacques C, et al. Cytosine methylation of mature microRNAs inhibits their functions and is associated with poor prognosis in glioblastoma multiforme. Mol Cancer. 2020;19(1):36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Schaefer M, Pollex T, Hanna K, et al. RNA cytosine methylation analysis by bisulfite sequencing. Nucleic Acids Res. 2008;37(2):e12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Squires JE, Patel HR, Nousch M, et al. Widespread occurrence of 5-methylcytosine in human coding and non-coding RNA. Nucleic Acids Res. 2012;40(11):5023–5033. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3367185/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Amort T, Rieder D, Wille A, et al. 5-methylcytosine profiles in poly(A) RNA from mouse embryonic stem cells and brain. Genome Biol. 2017;18(1):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Bohnsack KE, Höbartner C, Bohnsack MT. Eukaryotic 5-methylcytosine (m5C) RNA Methyltransferases: mechanisms, Cellular Functions, and Links to Disease. Genes (Basel) 2019;10(2):102. Available from: https://www.mdpi.com/2073-4425/10/2/102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Yang X, Yang Y, Sun B-F, et al. 5-methylcytosine promotes mRNA export - NSUN2 as the methyltransferase and ALYREF as an m5C reader. Cell Res. 2017;27(5):606–625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Fu L, Guerrero CR, Zhong N, et al. Tet-mediated formation of 5-hydroxymethylcytosine in RNA. J Am Chem Soc. 2014;136(33):11582–11585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Delatte B, Wang F, Ngoc LV, et al. RNA biochemistry.Transcriptome-wide distribution and function of RNA hydroxymethylcytosine. Science. 2016;351(6270):282–285. [DOI] [PubMed] [Google Scholar]
- [18].Laudadio I, Orso F, Azzalin G, et al. AGO2 promotes telomerase activity and interaction between the telomerase components TERT and TERC. EMBO Rep. 2019;20(2):20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1):10–12. Available from: http://journal.embnet.org/index.php/embnetjournal/article/view/200 [Google Scholar]
- [20].Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinforma Oxf Engl. 2011;27(11):1571–1572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–359. Available from: http://www.nature.com/nmeth/journal/v9/n4/full/nmeth.1923.html [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].The RNAcentral Consortium . RNAcentral: a hub of information for non-coding RNA sequences. Nucleic Acids Res. 2019;47(D1):D221–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Kim D, Pertea G, Trapnell C, et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36. Available from: http://genomebiology.com/2013/14/4/R36/abstract [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Frankish A, Diekhans M, Ferreira A-M, et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019;47(D1):D766–73. . [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–842. Available from: http://bioinformatics.oxfordjournals.org/content/26/6/841 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Legrand C, Tuorto F, Hartmann M, et al. Statistically robust methylation calling for whole-transcriptome bisulfite sequencing reveals distinct methylation patterns for mouse RNAs. Genome Res.2017;27(9):1589–1596. Available from. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5580717/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Tange O GNU Parallel: The Command-Line Power Tool. ;login:. 2011;36(1):42. Available from: https://zenodo.org/record/16303/export/xd [Google Scholar]
- [28].Bailey TL, Boden M, Buske FA, et al. MEME Suite: tools for motif discovery and searching. Nucleic Acids Res. 2009;37:W202–8. Available from: https://academic.oup.com/nar/article/37/suppl_2/W202/1135092 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Crooks GE, Hon G, Chandonia J-M, et al. WebLogo: a sequence logo generator. Genome Res. 2004;14(6):1188–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Blanco S, Bandiera R, Popis M, et al. Stem cell function and stress response are controlled by protein synthesis. Nature. 2016;534(7607):335–340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Khoddami V, Cairns BR. Identification of direct targets and modified bases of RNA cytosine methyltransferases. Nat Biotechnol. 2013;31(5):458–464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39:D152–7. Available from: http://nar.oxfordjournals.org/content/39/suppl_1/D152\ [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Schumann U, Zhang H-N, Sibbritt T, et al. Multiple links between 5-methylcytosine content of mRNA and translation. BMC Biol 2020;18(1):40. Available from: 10.1186/s12915-020-00769-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Huang T, Chen W, Liu J, et al. Genome-wide identification of mRNA 5-methylcytosine in mammals. Nat Struct Mol Biol.2019;26(5):380–388. [DOI] [PubMed] [Google Scholar]
- [35].Weichmann F, Hett R, Schepers A, et al. Validation strategies for antibodies targeting modified ribonucleotides. RNA. 2020;26(10):1489–1506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Warnecke PM, Stirzaker C, Song J, et al. Identification and resolution of artifacts in bisulfite sequencing. Methods. 2002;27(2):101–107. Available from: http://www.sciencedirect.com/science/article/pii/S1046202302000609 [DOI] [PubMed] [Google Scholar]
- [37].Hussain S, Sajini AA, Blanco S, et al. NSun2-Mediated Cytosine-5 Methylation of Vault Noncoding RNA Determines Its Processing into Regulatory Small RNAs. Cell Rep 2012;4(2):255–261. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3730056/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Yuan S, Tang H, Xing J, et al. Methylation by NSun2 Represses the Levels and Function of MicroRNA 125b. Mol Cell Biol. 2014;34(19):3630–3641. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4187725/ [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
