Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2022 Sep 5;38(20):4705–4712. doi: 10.1093/bioinformatics/btac601

Differential RNA methylation analysis for MeRIP-seq data under general experimental design

Zhenxing Guo 1, Andrew M Shafik 2, Peng Jin 3, Hao Wu 4,
Editor: Christina Kendziorski
PMCID: PMC9563684  PMID: 36063045

Abstract

Motivation

RNA epigenetics is an emerging field to study the post-transcriptional gene regulation. The dynamics of RNA epigenetic modification have been reported to associate with many human diseases. Recently developed high-throughput technology named Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq) enables the transcriptome-wide profiling of N6-methyladenosine (m6A) modification and comparison of RNA epigenetic modifications. There are a few computational methods for the comparison of mRNA modifications under different conditions but they all suffer from serious limitations.

Results

In this work, we develop a novel statistical method to detect differentially methylated mRNA regions from MeRIP-seq data. We model the sequence count data by a hierarchical negative binomial model that accounts for various sources of variations and derive parameter estimation and statistical testing procedures for flexible statistical inferences under general experimental designs. Extensive benchmark evaluations in simulation and real data analyses demonstrate that our method is more accurate, robust and flexible compared to existing methods.

Availability and implementation

Our method TRESS is implemented as an R/Bioconductor package and is available at https://bioconductor.org/packages/devel/TRESS.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Compared to well-studied DNA methylation and histone modification, RNA modification has been an emerging and very active research field in epigenetics over the past several years (Liu et al., 2020; Roundtree et al., 2017). There are several types of RNA modifications, including N6-methyladenosine (m6A), N3-methylcytosine (m3C) and N1-methyladenosine (m1A). Among them, m6A is by far the best known modification as it is the most prevalent internal modification in eukaryotes mRNA (Dominissini et al., 2012). m6A has been shown to associate with many human diseases such as cancer and neuronal disorders (Engel et al., 2018; Lan et al., 2019; Lin et al., 2019; Shafik et al., 2021). For example, in some cancers, increased m6A methylation appears to enhance the translation of oncogenes or degrade tumor suppressor genes (He et al., 2019). Studying the dynamics of m6A modification in various diseases can shed light on the biological mechanism and the progression of diseases, and potentially identify diagnostic biomarkers and therapeutic targets.

Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq) is a high-throughput sequencing technology that enables the transcriptome-wide profiling of m6A modifications. The MeRIP-seq is a capture-based sequencing technology, similar to ChIP-seq (Park, 2009). The data for each subject from a standard MeRIP-seq experiment consist of a paired input control and immunoprecipitation (IP) samples. The input control data is essentially an RNA-seq experiment, which measures the mRNA abundance or gene expression. The immunoprecipitation (IP) step captures mRNA fragments with m6A modification using a specific m6A antibody followed by high throughput sequencing, then the IP data are obtained by sequencing IP’ed mRNA fragments, which measures the post-IP mRNA abundance. The signal of interest for this data is, roughly speaking, the ratio between IP and input signals, which represents the transcriptome-wide m6A methylation level. Similar to DNA methylation analysis, one of the most fundamental goals in MeRIP-seq data analysis is to identify differentially methylated regions (DMRs), i.e. to detect genomics regions whose m6A levels significantly associate with the outcome, e.g. to significantly change from normal to disease condition. The DMRs can be functionally related to transcription regulations as they may promote or inhibit gene expression, which can subsequently affect phenotypes or the development of diseases. Therefore, differential RNA methylation analysis has important biological and clinical applications. For example, it can potentially discover biomarkers for disease diagnose at early stage, and help disease subtype identification.

A few methods have been developed for differential methylation analysis for MeRIP-seq data (Cui et al., 2018; Liu et al., 2017; Meng et al., 2013; Zhang et al., 2019), but all suffer from various limitations. As the earliest method, exomePeak (Meng et al., 2013) performs Fisher’s exact tests on the normalized average read counts from two groups of IP/input samples to identify differentially methylated regions. There are several problems with such approach. First, it ignores the biological variation among replicates since the replicated data will be pooled together. Secondly, the Fisher’s exact test on 2 × 2 table implicitly assumes that the sequence reads are independent. These problems lead to very liberal statistical inference and results in many false positives. As an extension of exomePeak, MeTDiff (Cui et al., 2018) models biological variations based on beta-binomial distributions. They assume the ratio of raw IP over the sum of raw IP and input follow a beta distribution, ignoring the impact of sequencing depth. Such an approach brings mathematical convenience, but also leads to biased estimates and inaccurate DMRs detection. QNB (Liu et al., 2017) models the effect of sequencing depth and detects differential m6A regions based on four negative binomial distributions. One problem is that it takes the within-IP and within-input variances as the variation of the biological signals. That approach is undesirable since in MeRIP-seq data, the signal of interest is the normalized IP/input ratio, and the variance of such ratios is the biological variation of methylation levels. Without properly modeling the data, QNB may generate undesired results in DMR detection.

It is also worthy to note that, all the three aforementioned methods only work for two-group comparisons, and none of them can adjust for potential confounding factors (such as age and gender) in a complex design. A more recent method RADAR (Zhang et al., 2019) is developed based on a Poisson linear random effect model and works for general experimental designs. One concern in RADAR is that it assumes Poisson distribution for the preprocessed read count. It is normal to model the raw read counts as Poisson, but after preprocessing, the data are not counts anymore and thus the Poisson assumption is invalid. Another limitation of RADAR is that it is not convenient to test different covariates in the linear model. To test a different covariate, one has to provide a new design and refit the model, which would greatly increase the computational burden.

In this work, we develop a novel statistical method to identify differentially methylated m6A regions from MeRIP-seq data. We model the sequencing counts by a hierarchical negative binomial model, which accounts for both technical and biological variations. The methylation levels are linked to the experimental design through a linear model allowing flexible statistical inference. We design a Wald test for hypothesis testing, which identifies DMRs based on specified contrasts of interests. The contrasts can be any linear relationships among the experimental factors. We perform extensive simulation studies and real data applications, and demonstrate that our method provides more accurate DMR detection and statistical inference than existing methods. Our method is implemented in an R package TRESS (Toolbox for mRNA Epigenetics Sequencing analysiS) and is freely available at https://bioconductor.org/packages/devel/TRESS.

2 Materials and methods

Our method identifies DMRs in a two-step approach. First we call m6A methylated regions (referred to as ‘peaks’ hereafter) from each paired IP and input data alone. This step is performed using our previously developed peak calling method, which is also implemented in TRESS (Guo et al., 2021). After that, we take a union of the called peaks from all samples to obtain a list of genomic regions, which are considered as ‘candidate DMRs’. The true DMRs must be contained in the candidate DMRs, since the rest of the transcriptomes are non-methylated regions from all samples, thus cannot be DMRs. This first step is important because by filtering out a majority of the regions, it greatly reduces the computational burden and the space for multiple testing correction. After obtaining the candidate DMRs, we design rigorous statistical methods for modeling the sequence counts and conducting statistical inference for DMR detection.

2.1 Statistical model for read counts in the candidate DMRs

Suppose the dataset contains N samples, each has data from one input and one IP experiments. Assume there are M candidate DMRs. Let Xij and Yij denote the reads counts in the ith candidate DMR and jth sample, from the input and IP data respectively. We first assume that the read counts follow Poisson distributions: Xij|λijxPoisson(sjxλijx) and Yij|λijyPoisson(sjyλijy), where the λ’s are underlying Poisson rates, and sjx and sjy are corresponding size factors brought by the variation in sequencing depth. To model the biological variation for counts in a candidate DMR across different samples, we assume λijyGamma(αijy,θi) and λijxGamma(αijx,θi). Here, α’s and θ are shape and scale parameters in the Gamma distributions. It is important to note that we assume a common scale parameter for all samples in the same candidate region (θi is only indexed by i) for mathematical convenience in the derivation shown below. We will show by a series of sensitivity analyses that such assumption has little impact on the results. With these model assumptions, the read counts follow a negative binomial (Gamma–Poisson) distribution marginally, which is a commonly used data model for various sequencing data.

We define a quantity λijyλijx+λijy to represent the methylation level of candidate DMR i in sample j. Note that this quantity is not a direct measurement, but is monotone to the methylation level. Based on our model assumption for the λ’s, λijyλijx+λijy naturally follows a beta distribution Beta(αijy,αijx). We reparameterize this beta distribution by mean (μij) and dispersion (ϕij), with μij=αijyαijx+αijy and ϕij=1αijx+αijy+1. We assume that all samples share the same dispersion of methylation at candidate DMR i, i.e. ϕij=ϕi for all j. Then the α’s in the Gamma distribution can be represented by μij and ϕi: αijx=(1μij)(ϕi11) and αijy=μij(ϕi11).

In a differential methylation analysis, one wants to investigate whether experimental factors are associated with methylation levels at some regions. For that, we link the mean methylation μij to the experimental factors of sample j through a logit linear model: logit(μij)=αi+zjT*βi. Here, αi reflects the baseline methylation of candidate DMR i, zj is a p-vector containing the factors of interest, and βi is a p-vector for the coefficients. Similar to other sequencing data analysis such as the RNA-seq differential expression, the dispersion parameter ϕi plays an important role in the statistical inference in such a model. When the number of biological replicates is small, it is desirable to impose a prior on ϕi, which introduces a ‘shrinkage’ effect to provide robust parameter estimation and stable statistical inference. Here, we impose a log-normal prior on ϕi, which is based on the distribution of observed dispersion in MeRIP-seq data. Let Ri=(αi,βiT)T and DjT=(1,zj)T, then based on the above settings, our complete data model for counts in candidate DMRs is:

Xij|λijxPoisson(sjxλijx),Yij|λijyPoisson(sjyλijy),λijx|ϕiGamma((1μij)(ϕi11),θi),λijy|ϕiGamma(μij(ϕi11),θi),ϕilogN(m,ν2).logit(μij)=DjTRi (1)

With this data model, the differential methylation detection for a factor of interested can be achieved by testing the corresponding element in R. For each candidate DMR, we will estimate the model parameters and perform hypothesis testing. Then, the candidate DMRs can be ranked by statistical significance.

It’s worth to provide some discussions on the similarity and difference between our and some existing models. Firstly, we model the read counts for both IP and input by Poisson distributions, which is also used by RADAR. The difference is that RADAR assumes Poisson distribution for the preprocessed instead of raw read counts. The preprocessing in RADAR starts by library size normalization and ends with pre-IP (or input) count adjustment, resulting in a quantity analogous to Yij/sjyXij/sjx in our model 1. Poisson assumption for such preprocessed data in RADAR is inappropriate. Secondly, our model reduces to a beta-binomial GLM model when there is no sequencing depth effects. In that case, sjx=sjy=1, then Yij|Xij+Yij=TijBinomial(Tij,pij), and pijBeta(μij,ϕi), with logit(μij) defined the same as 1. In fact, it is the sequencing depth effects that prevents one from directly modeling data by a beta-binomial distribution and using a beta-binomial GLM. MeTDiff assumes a beta-binomial distribution of the counts at the cost of ignoring the sequencing depth effects, which is clearly undesirable. Therefore, our model is more general than a beta-binomial GLM.

2.2 Parameter estimation

The parameters for every candidate DMR i in the above model include Ri, ϕi and θi. We adopt maximum likelihood approaches to obtain estimates. The algorithms are summarized in Table 1. Briefly, we first obtain initial estimates of Ri with linear regression by treating logit(Yij/sjyYij/sjy+Xij/sjy) as observed data, and initial estimates of ϕi and θi by method of moment. Next, we iteratively update Ri, ϕi and θi by maximizing the data likelihood li(Ri,ϕi,θi;Xi.,Yi.), where Xi.=(Xi1,Xi2,,XiN) and Yi.=(Yi1,Yi2,,YiN). Finally, given ϕ^i,θ^i and R^i, we update ϕi with its posterior mode (ϕ˜i) by maximizing the product of log data likelihood and the prior of ϕi. Detailed mathematical derivations for all parameter estimation procedures are provided in Supplementary Material Section S1.

Table 1.

Parameter estimate for candidate DMR i

Input: Reads count matrix X and Y, design matrix
   DT=(D1,D2,,DN), and size factors sjx,sjy
   for j =1, 2, …, N.
Output: R^i=(α^i,β^iT)T and Cov^(R^i).
1. Initialization: R^i(0)=(DTD)1DTOi, with
Oij=logit(Yij/sjyXij/sjy+Yij/sjy),Oi=(Oi1,Oi2,,OiN)T.
 Obtain ϕ^i(0) and θ^i(0) with method of moment.
 Let t = 0.
2. Iteratively estimate MLE of ϕi, θi, Ri. In particular,
 for t = k, k=1,2,,
 (i). Given R^i(t1),ϕ^i(t1) and θ^i(t1), obtain
    R^i(t)=argmaxRli(R;Xi.,Yi.,θ^i(t1),ϕ^i(t1))
  by Newton–Raphson algorithm.
 (ii). Given R^i(t),ϕ^i(t1) and θ^i(t1), obtain
    (ϕ^i(t),θ^i(t))=argmaxϕ,θli(ϕ,θ;Xi.,Yi.,R^i(t)).
 Let t=t+1, repeat above two steps until converges,
 and let R^i = R^i(t*),ϕ^i=ϕ^it* and θ^i=θ^i(t*), assuming
t* times of iteration.
3. Estimate posterior of ϕi given R^i and θ^i:
 (i). With ϕ^i estimated for all candidate DMRs from
   Step 2, estimate m and ν by method of moment,
   denoted as m^ and ν^.
 (ii). Then,
  ϕ˜i=argmaxϕ[li(ϕ;Xi.,Yi.,R^i,θ^i)+loglogN(ϕ;m^,ν^2)].
4. Calculate Cov^(R^i) using the inverse of observed Fisher
  information: (2liRiRiT)1|Ri=R^i,θi=θ^i,ϕi=ϕ˜i.

2.3 Hypothesis testing

Our model provides flexible means for hypothesis testing through the linear model framework. Once the model is fit, any linear combination, specified by users as a (p+1)-vector C, of the model coefficients can be tested. To be specific, given C, we test the null hypothesis H0:CTRi=0 for all candidate DMRs (indexed by i) using Wald test.

For candidate DMR i, the Wald statistics is calculated as TW=CTR^iSE^(CTR^i). Here, SE^(CTR^i)=CTCov^(R^i)C, with Cov^(R^i) being inverse of the observed Fisher information matrix of Ri evaluated at R^i = (α^i,β^iT)T. P-values are calculated by assuming TWN(0,1) under the null, which are then adjusted for multiple testing using the false discovery rate (FDR) procedure (Benjamini and Hochberg, 1995).

Our procedure separates the model fitting, which is the most computationally heavy part, from the hypothesis testing. Given an experimental design with multiple factors, the parameter estimation (model fitting) only needs to be performed once, and then the hypothesis testing for DMR calling can be performed for different factors efficiently. This is an added advantage over existing methods such as exomePeak, MeTDiff or RADAR.

2.4 Software implementation

Our method is implemented as an R package TRESS. Our software implementation is computationally efficient. For a dataset with 10 000 candidate DMRs, two treatment groups and two replicates in each group, TRESS takes 3.48 min using 4 cores on a MacBook Pro laptop with i5 2.3 GHz CPU and 16 GB RAM. The computational time is approximately linear to the number of candidate DMRs and number of samples.

3 Results

We first apply TRESS on simulated data to evaluate its performance in various aspects including parameter estimation, DMRs calling accuracy and statistical inference. The data were simulated based on a mouse brain dataset in order to mimic the real data characteristics (Supplementary Figs S1 and S2). The whole simulation involves two general scenarios: a simple two-group comparison, and one with continuous covariate in addition to group assignment. In each simulation, we assume there are 10 000 candidate DMRs and 10% of them are differentially methylated between treated and untreated conditions. The number of replicates under each condition varies from 2 to 5 in order to assess the effect of sample size on DMR calling. Our overall simulation procedure is summarized in Figure 1. Briefly, methylation levels μ, dispersion ϕ and gamma scale parameter θ are firstly simulated, based on which normalized Poisson rates λx and λy are randomly sampled from gamma distributions. Given Poisson rates and size factors, read counts from input and IP samples are randomly sampled from respective Poisson models. Detailed simulation for each parameter is included in Supplementary Material Section S2.1.

Fig. 1.

Fig. 1.

Schematic overview of the simulation study

3.1 Tress generates unbiased and accurate parameter estimates

The parameter estimation plays an important role in statistical inference, with better estimation leading to improved DMR calling accuracy. Here, we evaluate the performance of parameter estimation from TRESS under different simulation settings. In particular, we focus on the estimation of baseline methylation level (α), treatment effect (β) and dispersions (ϕ).

As shown in Figure 2A, for both α and β, most dots are closely clustered and centered around the identical line indicating no strong systematic bias in parameter estimates for most regions. Also, increased sample size contributes to reduced absolute bias for both α and β (Supplementary Fig. S3A), suggesting that TRESS properly takes advantage of larger sample sizes.

Fig. 2.

Fig. 2.

Evaluation of parameter estimates by TRESS. (A) Scatter plots to compare true and estimated value for α and β. (B) MSE of treatment effect (β) estimate. For these boxplots, each panel contains results for mean of log-dispersion m = – 5 or –4. Different boxes in each panel are obtained using different numbers of replicates. (C) Comparison of the shrinkage estimation and the true value of dispersion, in a scatter plot (top) and histogram (bottom)

Further shown in Figure 2B, the MSE of β estimates and their variation across all regions significantly decrease when the number of replicates increases from two to five. Although the MSE increases when the mean of log dispersion is larger (panels from left to right in Fig. 2B), such increment is smaller when sample size is larger. In other words, the loss of accuracy in parameter estimation due to larger biological variation can be compensated by increased sample size. Consistent results are observed when we examined the estimation of α (Supplementary Fig. S3B).

We next evaluate the dispersion estimate by TRESS, as it accounts for the uncertainty of treatment effects estimate, and is a crucial component in the Wald test statistics. As shown in Supplementary Figure S3C although more replicates still result in more accurate and less variable dispersion estimates, the relative increase in MSE under smaller sample sizes are smaller than those in Figure 2B and Supplementary Figure S3B. This is due to the shrinkage estimation procedure implemented in TRESS, where most estimates were shrunk toward the population mean (Fig. 2C) regardless of sample size, making the dispersion estimate robust to sample size. Overall, TRESS generates precise and robust parameter estimations that are essential for downstream statistical inference.

3.2 Tress provides better ranked DMRs

We next evaluate the DMR calling accuracies of the proposed method, focusing on the ranks of the reported DMRs. We use the proportion of true DMRs among top ranked DMRs as a metric for such evaluation. To benchmark our method, we also apply exomePeak, QNB, MeTDiff and RADAR on simulated datasets for comparison.

Results displayed in Figure 3 are based on simulations of a simple two-group comparison design. As shown, all methods have increased accuracies as the number of replicates increases (from left to right column). This is reasonable since the statistical test is expected to be more accurate with larger sample size. However, TRESS outperforms other methods in both cases. Another finding from Figure 3 is that, all methods provide lower accuracy when dispersion becomes large (from top to bottom row). This is reasonable because, when dispersion is getting large, the biological variation among replicates increases and more variable values would occur in read counts. Such variability makes it more difficult to precisely estimate treatment effects especially when sample size is small. Nevertheless, TRESS still outperforms the competing methods due to the proper modeling of biological variation and robust shrinkage estimate procedure for dispersion. Similar results are also observed when the receiver operating characteristic curve (ROC) from all methods were compared in Supplementary Figure S4.

Fig. 3.

Fig. 3.

Proportion of true DMRs among top 1000 DMRs identified by each method, from a two-group comparison simulation. Panels in left and right columns contain results for different sample sizes (two and five replicates). Panels in top and bottom rows contain results for different levels of biological variations (means of log-dispersion –5 and –4)

We next apply TRESS on a simulated data where the design also contains a continuous covariate (e.g. age) in addition to the binary treated/untreated condition. A method that can adjust for and is not overly affected by additional covariates is better as it is more compatible with studies with complex designs. Results of these simulations are summarized in Supplementary Figure S5. Consistent with Figure 3, accuracies of all methods increase with larger sample size but decrease with larger dispersion, but TRESS still performs the best in all scenarios. All of these results together demonstrate that TRESS is more accurate and is more preferable under more general settings compared to exomePeak, QNB, MeTDiff and RADAR.

3.3 Tress is robust against model misspecification

We next evaluate the robustness of TRESS when data are not generated from our proposed data model. We perform additional simulations under four scenarios of model misspecification: (i) using different scale parameter θs from IP to input data for both treated and untreated groups; (ii) sampling Poisson parameter λs from log-normal distributions instead of gamma distributions; (iii) simulating data based QNB model described by Liu et al. (2017); and (iv) simulating data based on MeTDiff model (details are provided in Supplementary Section S2.5).

Results of setting (i) are shown in Supplementary Figures S6 and S7, where TRESS still outperforms the other methods in all scenarios. In fact, the performance gains of TRESS over other methods are more significant compared to the results presented in Figure 3 and Supplementary Figure S5. Under this setting, the performances of TRESS remain about the same, but other methods have significant performance reduction. This is because when samples have different θi, the variances of respective λij’s become larger, and the scales of such inflation differ from IP and input data. Without properly handling variations in read counts, the performance of other methods suffer significantly. In contrast, TRESS models the variation of normalized IP/input ratio, which makes it more robust in such a scenario.

Results under sensitivity setting (ii) are shown in Supplementary Figures S8 and S9, which are similar to those in Figure 3 and Supplementary Figure S5. This similarity is reasonable because, with mean and variance fixed, the log-normal and gamma distributions are similar to each other (Supplementary Fig. S10).

Results under sensitivity setting (iii) are presented in Supplementary Figure S11. As shown, when data are simulated based on QNB model, QNB and TRESS achieve the highest and second-highest AUC score when methylation level is only differentiated by treated/untreated conditions. However, TRESS and RADAR outperform the other methods when confounding effect from covariates such as age exists in data. Results under sensitivity setting (iv) are displayed in Supplementary Figure S12. Although data are simulated based on the MeTDiff model, TRESS still reports the most accurate DMR ranking compared to the other methods. Overall, results of these sensitivity simulation analyses demonstrate that TRESS is very robust against model misspecification.

3.4 Tress provides more accurate statistical inference

Another important property of a statistical method is the accuracy of statistical inference. We have shown that TRESS provides better ranked DMRs. Here, we want to evaluate whether it provides good statistical inference in terms of controlling the type I error and FDR.

As we derive P-values based on normality assumptions for Wald-test statistics, we first demonstrate the validity of this approach by examining the distribution of Wald statistics. Histogram of Wald statistics and the normal quantile–quantile (QQ) plots in Supplementary Figure S13 suggests that the statistics follow a normal distribution very well in the middle, with the heavier tails on both sides from the DMRs. Furthermore, we investigate the P-value distributions from the hypothesis test (Supplementary Fig. S14), for all regions and the background regions (with no differential methylation) only. Under the null (background regions), P-values by TRESS are uniformly distributed, while P-values from RADAR, MeTDiff and exomePeak all show a strong skewness toward zero. The skewness results in larger proportions of false positives from those methods. On the other hand, the null P-values from QNB skew toward 1, indicating overly conservative test. Overall, the P-value distributions from TRESS show the best property.

We further calculate the empirical type I error rate and FDR at nominal level of 0.05 for all methods from 100 simulations. As summarized in Figure 4, TRESS reports the most accurate type I error rates and FDRs across all scenarios. QNB and MeTDiff generate relatively accurate but variable type I errors and much more liberal FDRs. Moreover, the FDRs from MeTDiff are very sensitive to the number of replicates: with two replicates, its FDR is greatly inflated. Both RADAR and exomePeak are too liberal in both type I errors and FDRs, with exomePeak being even worse. Inference results by each method under other simulation settings are included in Supplementary Figures S12D, S15–S20, where TRESS still outperforms its alternatives across all scenarios. Overall, these results together support the validity of TRESS in using normal P-values and demonstrate that TRESS provides more accurate statistical inferences compared to RADAR, MeTDiff, QNB and exomePeak.

Fig. 4.

Fig. 4.

Empirical type I error rates and FDRs by each method under respective nominal threshold: P-value < 0.05 and FDR < 0.05. Here, the data are simulated purely based our proposed model in a two-group comparison. Different panels contain results under different sample sizes and biological variations. The black dotted horizontal line in each panel marks the nominal level 0.05

3.5 Tress detects DMRs with stronger biological signals

We next apply TRESS on two sets of real data, both are obtained from the GEO database. The first dataset (GEO accession no. GSE46705) contains four samples from human HeLa cell line: one wild-type (WT) sample and three treated samples. The treatments correspond to the knock-down (KD) of complex METTL3, METLL14 and WTAP respectively (Liu et al., 2014). Each sample contains two replicates. This dataset is referred to as HeLa data. The second dataset (GSE144032) contains 2- and 6-week old mouse brain samples from four brain regions: cerebellum, cortex, hippocampus and hypothalamus. Each sample contains two replicates. This dataset is referred to as Young mouse data hereafter. For HeLa data, we are interested in the differential methylation between KD and WT sample. We apply TRESS to identify methylated regions in all samples and obtain a total of 12 171 candidate DMRs. For Young mouse data, we focus on the temporal difference of m6A methylation in each region. A total of 12 601, 12 351, 12 215 and 12 678 candidate DMRs are obtained for cerebellum, cortex, hippocampus and hypothalamus, respectively. Similar to simulation study, we also applied RADAR, MeTDiff, QNB and exomePeak on these datasets for comparison purpose.

The numbers of DMRs at FDR < 0.05 detected from different methods in the HeLa data are shown in Table 2. TRESS calls in total 356, 112 and 66 significant DMRs from the comparison of WTAP-KD versus WT, METTL14-KD versus WT and METTL3-KD versus WT samples, with 329, 93 and 46 being hypomethylated in respective knockdown samples. These hypomethylated numbers are consistent with findings by Liu et al. (2014), which generates the raw data and reports that WTAP has the largest effect on m6A levels, followed by METTL14 and METTL3. However, the reported effect order among three methyltransferases does not hold for other methods. For example, both exomePeak and QNB suggest METTL3 has the largest effect on m6A level, followed by METTL14, and WTAP. MeTDiff indicates that METTL14 has the largest effect.

Table 2.

Number of significant DMRs by each method at FDR < 0.05

TRESS
RADAR
exomePeak
QNB
MeTDiff
Hypo Hyper Hypo Hyper Hypo Hyper Hypo Hyper Hypo Hyper
WTAP-KD versus WT 329 36 0 0 2402 1794 9 6 23 12
M14-KD versus WT 93 19 0 0 2508 2327 22 102 5141 842
M3-KD versus WT 46 20 198 446 1821 3560 656 2050 2074 87

A closer comparison between the number of hypo- and hypermethylated DMRs further suggests a higher accuracy from TRESS. We refer hypomethylated (hypermethylated) DMRs as DMRs being hypomethylated (hypermethylated) in knockdown samples. As shown in Table 2, TRESS identifies more hypomethylated than hypermethylated DMRs across all comparisons. The same trend is observed from MeTDiff. This is preferred because the depletion of methytransferases is expected to reduce the transfer efficiency of methyl groups to their substrates and thus leading to an overall decreased methylation. Existence of a smaller number of hypermethylated DMRs from each comparison may be due to the incomplete knockdown efficiency (Liu et al., 2014; Poh, 2022). However, such expected results by TRESS are sometimes not obtained by other methods. For example, for METTL3-KD versus WT, exomePeak, QNB and RADAR call much more hypermethylated DMRs. QNB also reports more hypermethylated ones for METTL14-KD versus WT.

We then investigate the biological signals in DMRs reported by each method. The biological signal can be roughly represented by the normalized fold change computed from the counts. Biologically, DMRs with stronger biological signals are more preferable than the DMRs with weak biological signals, even if the later ones have small P-values. A method that can detect DMRs with both statistical and biological significance is more preferable. Here, we jointly examine the statistical and biological significance of DMRs using volcano plots.

As shown in Figure 5A, the results for WTAP-KD versus WT (panels in the first row), the volcano plot from TRESS shows the most desired pattern: the candidate regions with larger fold change tend to have lower FDR. The volcano plot from exomePeak shows similar pattern but is skeptical to have a much higher level of false positives as there are many regions with very small fold changes but low FDRs. This result is consistent with Figure 4, where exomePeak outputs overly inflated type I errors and FDRs. The FDRs reported by RADAR appear to have a lower bound, suggesting the inference procedure of RADAR may not be appropriate. Volcano plot for MeTDiff looks very undesirable as the P-values are almost independent of their fold changes. We next look at percentage of DMRs with ‘large’ fold changes (defined as absolute log2 fold change greater than 1), detected for WTAP-KD versus WT. As shown in Figure 5B, although all methods except for RADAR report more hypomethylated regions, TRESS’s DMRs have the greatest total proportions (∼70%) of having large fold changes. In comparison, almost half of the DMRs reported from exomePeak and MeTDiff have very small fold changes.

Fig. 5.

Fig. 5.

Evaluation of biological significance of DMRs by each method for HeLa data. (A) Statistical significance versus log fold change for DMRs by TRESS, RADAR, QNB, exomePeak and MeTDiff. Each panel shows the volcano plot of DMRs by each method, where DMRs are categorized into three groups: hyper-methylated with abs(log2(FC))>=1, hypo-methylated with abs(log2(FC))<1 and NO otherwise. (B) Barplots to show the percentage of DMRs with large fold changes by each method, and the proportion of hyper- and hypo-methylated DMRs. In both (A) and (B), panels from top to bottom correspond to comparisons WTAP-KD versus WT, METTL14-KD versus WT and METTL3-KD versus WT respectively

For METTL14-KD versus WT and METTL3-KD versus WT comparisons (middle and bottom panels in Fig. 5A and B), TRESS also has the best shaped volcanos, and the highest proportion of DMRs satisfying abs(log2(FC))>1. For METTL3-KD versus WT, although the total number of DMRs by the other methods are much greater than TRESS, only around 5% of them indeed have large fold changes. In addition, percentages of hypermethylated regions by other methods sometimes are comparable to or even greater than hypomethylated regions, suggesting a lower reliability of their DMRs compared to TRESS.

We next examine the overlapping patterns of DMRs by different methods in Supplementary Figure S21. As the number of DMRs differs a lot from one to another method for all three sets of comparisons, ranging from 0 to a few thousand, DMRs shared by all methods are trivial. Nevertheless, TRESS and QNB have minimal number of unique DMRs, while unique DMRs by exomePeak are moderate across all comparisons.

We repeat above analyses for Young mouse data and compare 2- to 6-week-old mice, where the number of DMRs by each method is included in Supplementary Table S1. Both RADAR and QNB detect zero temporal DMRs within cerebellum, hippocampus and hypothalamus, whereas TRESS and MeTDiff identify limited number of temporal DMRs within all four brain regions. A closer check on the volcano plots shows that TRESS still outperforms the others in generating good shape of volcanos (Supplementary Fig. S22) and reporting high percentage of DMRs that satisfy abs(log2(FC))>1 (Supplementary Fig. S23). Overall, all these results together demonstrate that DMRs by TRESS have greater biological significance and thus are more likely to be true compared to its alternatives.

3.6 Tress identifies biologically meaningful m6A regions

As Young mouse data were originally developed to study aging process and Alzheimer’s disease (AD), here, we examine whether top DMRs uniquely by TRESS are involved in pathways and/or gene ontology (GO) terms that are biologically relevant to AD. We use the 6- and 2-week-old mouse brain hypothalamus samples as an example. In Supplementary Table S2, we present the top 5 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and top 5 GO biological processes enriched by 319 genes that are identified based on the unique top DMRs by TRESS.

In fact, all top five pathways are revealed to correlate with neurodegenerative disease such as AD (Breteler et al., 1991; Chao et al., 2006; Chen et al., 2013; Dumbacher et al., 2018; Fu et al., 2010; Kosuru and Chrzanowska, 2020; Marathe et al., 2017; Peng et al., 2004). For example, thyroid hormones (THs) signaling plays a vital role in perinatal brain development, where thyroid dysfunction can affect neural stem cell (NSC) homeostasis that could increase the risk of neurocognitive disorders (Breteler et al., 1991; Fu et al., 2010). Rap 1, activated by cAMP-dependent GEF Epac, controls cytoplasmic Ca2+ levels, whose dyshomeostasis and hyperactivity play a central role in AD pathology and progression (Dumbacher et al., 2018; Kosuru and Chrzanowska, 2020).

Among genes involved in above pathways, studies report that PSEN gene (e.g. PSEN2) might play a crucial role in early onset AD because their mutations cause abnormal processing of the amyloid-β protein precursor (APP), which in turn causes the accumulation of amyloid-β (Aβ) peptide, postulated as a common initiating factor in AD pathogenesis (Cai et al., 2015; Levy-Lahad et al., 1995; Sherrington et al., 1995; Uemura et al., 2011). DUSP6 is reported to might have a neuroprotective effect on Aβ-induced toxicity in NSCs, because its overexpression increased cell vitality of NSCs after Aβ treatment, suggesting DUSP6 a potential treatment target for AD (Liao et al., 2018).

In addition to pathways, the same set of genes from TRESS is also involved in biological processes such as miRNA processing, and regulation of cell cycles. Overall, the genes uniquely identified by top DMRs of TRESS from Young mouse data are biologically meaningful, and can provide candidates for the study of neurocognitive disorders such as AD.

3.7 Tress identifies interaction DMRs from multivariate designs

One important feature of TRESS is that it can test any linear combination of the experimental factors in a multi-factor design. For example, TRESS can detect the interaction effect, which cannot be handled by exomePeak, QNB or MeTDiff. Even though RADAR can test interaction, its software design makes it not very convenient. To use RADAR for testing interaction, one needs to change the design and explicitly specify the interaction and then refit the model for testing, which greatly increase the computational burden.

Here, we investigate this interaction DMR detection using Young mouse data. As mentioned earlier, this data contains 2- and 6-week-old samples from four different mouse brain regions. Here, we take 2- and 6-week-old cortex and hypothalamus samples as an example. In particular, TRESS identified 27 DMRs at FDR < 0.05 that are caused by the interaction between time and region. Supplementary Figure S24 shows the normalized IP and input read counts across all samples in 11 DMRs, each of which overlapping with one particular gene. Depending on the methylation status, these genes are divided into different groups (Supplementary Table S3). Basically, the temporal difference in methylation levels of these genes change spatially. Such changes reported by TRESS contribute to more meaningful downstream analysis regarding the function of genes modulated by m6A. For example in cortex, as visualized using IGV (Thorvaldsdóttir et al., 2013), the DMR on gene PRNP (Supplementary Fig. S25) is hyper-methylated in 6-week-old samples compared to 2-week-old samples. However, in hypothalamus, it shows no temporal difference in methylation level. Studies show that gene PRNP affects memory as the loss function of this gene can lead to significant cognitive abnormalities (Balducci et al., 2010; Criado et al., 2005; Linden et al., 2008). In addition to PRNP, other genes like NCOR1 and KANK1 also contain interaction DMRs. In hypothalamus, both NCOR1 and KANK1 are hyper-methylated in older samples, but hypo-methylated in older cortex samples or no temporal difference in cortex (Supplementary Fig. S24). It has been reported that the depletion of NCOR1 associates with memory impairment through a novel GABAergic hypothalamus-CA3 projection (Zhou et al., 2019); the deletion of KANK1 has been reported to associate with cerebral palsy (Lerer et al., 2005). The spatial changes in temporal methylation difference on these genes suggest a potential region-specific function of these genes. This results further demonstrates the flexibility in statistical inference from TRESS.

4 Discussion

RNA epigenetics is a relatively new but very active research direction in epigenetics field. MeRIP-seq is a high-throughput technology to profile transcriptome-wide RNA epigenetic modifications. MeRIP-seq data possess properties similar to a combination of RNA- and ChIP-seq. Compared to the large body of works for RNA- and ChIP-seq data, the methodological development for MeRIP-seq is seriously lacking. There are only a few methods available for differential methylation from MeRIP-seq, all with serious limitations. In this work, we develop a novel and rigorous statistical method to detect differentially RNA methylated regions from MeRIP-seq data. Our developed method models the sequence read counts with a hierarchical negative binomial model that can account for variations from multiple sources including sequencing depth and within-group variance, and connect the sample-specific methylation levels with experimental factors by a linear framework to allow for flexible statistical inference for DM detection.

We perform extensive simulation studies to demonstrate that TRESS provides DMRs with more accurate ranking and statistical inference than existing methods. Additional simulation shows that TRESS is robust to model misspecification. In real data analyses, we show that TRESS reports DMRs with greater biological significance compared to the existing methods. In addition, TRESS is more flexible and efficient to analyze dataset with multiple factors, evidenced by calling the time by region interaction DMRs in the Young mouse data. Overall, TRESS is more accurate, flexible, robust and efficient compared to existing methods.

TRESS conducts a filtering step on candidate DMRs before the statistical modeling step. It filters out regions with small relative dispersion in methylation level, where the relative dispersion is measured by the marginal coefficient of variation. For that, TRESS pools all data without considering the outcome (such as treated/untreated group assignments). Therefore, the filtering is outcome-independent, making TRESS’s downstream inference different from the selective inference (Taylor and Tibshirani, 2015) which tests associations that have already been mined from the same data. This type of outcome-independent filtering is common in many other high-throughput data analysis methods, for example, to filter out genes with very low total counts in differential expression analysis in RNA-seq (Love et al., 2014). We further demonstrate in simulation studies (Supplementary Material  Fig. S26) that, the filtering step has very little impact on both downstream power and FDR.

There are some directions we would like to work on in the near future. First, we observe that the changes of methylation appear to be associated with the changes of gene expression. With this observation, we would like to more carefully investigate the relationship between DM and DE, and utilize this information to further improve the performance of DM calling. In addition, since the samples profiled in the MeRIP-seq experiments are often mixtures of different cell types (e.g. brains or blood), it is desirable to identify cell type specific differential methylation. Similar works have been proposed for DNA methylation and gene expression data (Li and Wu, 2019; Li et al., 2019). To develop methods for cell type specific m6A analysis is also our research interest in the near future.

Funding

This work was partly supported by several grants from the National Institutes of Health [GM122083 and GM141392 to H.W., NS111602, MH116441 and HG008935 to P.J.].

Conflict of Interest: none declared.

Supplementary Material

btac601_Supplementary_Data

Contributor Information

Zhenxing Guo, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA.

Andrew M Shafik, Department of Human Genetics, Emory University, Atlanta, GA 30322, USA.

Peng Jin, Department of Human Genetics, Emory University, Atlanta, GA 30322, USA.

Hao Wu, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA.

Data Availability

The two MeRIP-seq datasets used in this manuscript are publicly available. The Young mouse data are available at GEO under accession code GSE144032. The HeLa data are available at GEO under accession code GSE46705. The R/Bioconductor package T RESS is freely available at https://bioconductor.org/packages/devel/T RESS.

References

  1. Balducci C.  et al. (2010) Synthetic amyloid-β oligomers impair long-term memory independently of cellular prion protein. Proc. Natl. Acad. Sci. USA, 107, 2295–2300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Benjamini Y., Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodological), 57, 289–300. [Google Scholar]
  3. Breteler M.  et al. ; A Hofman for the Eurodem Risk Factors Research Group. (1991) Medical history and the risk of Alzheimer’s disease: a collaborative re-analysis of case–control studies. Int. J. Epidemiol., 20, S36–S42. [DOI] [PubMed] [Google Scholar]
  4. Cai Y.  et al. (2015) Mutations in presenilin 2 and its implications in Alzheimer’s disease and other dementia-associated disorders. Clin. Interv. Aging, 10, 1163–1172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chao M.V.  et al. (2006) Neurotrophin signalling in health and disease. Clin. Sci. (Lond.), 110, 167–173. [DOI] [PubMed] [Google Scholar]
  6. Chen X.-F.  et al. (2013) Transcriptional regulation and its misregulation in Alzheimer’s disease. Mol. Brain, 6, 44–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Criado J.R.  et al. (2005) Mice devoid of prion protein have cognitive deficits that are rescued by reconstitution of PRP in neurons. Neurobiol. Dis., 19, 255–265. [DOI] [PubMed] [Google Scholar]
  8. Cui X.  et al. (2018) MeTDiff: a novel differential RNA methylation analysis for meRIP-seq data. IEEE/ACM Trans. Comput. Biol. Bioinform., 15, 526–534. [DOI] [PubMed] [Google Scholar]
  9. Dominissini D.  et al. (2012) Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature, 485, 201–206. [DOI] [PubMed] [Google Scholar]
  10. Dumbacher M.  et al. (2018) Modifying Rap1-signalling by targeting Pde6δ is neuroprotective in models of Alzheimer’s disease. Mol. Neurodegen., 13, 1–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Engel M.  et al. (2018) The role of m6A/m-RNA methylation in stress response regulation. Neuron, 99, 389–403.e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fu A.L.  et al. (2010) Thyroid hormone prevents cognitive deficit in a mouse model of Alzheimer’s disease. Neuropharmacology, 58, 722–729. [DOI] [PubMed] [Google Scholar]
  13. Guo Z.  et al. (2021) Detecting m6A methylation regions from methylated RNA immunoprecipitation sequencing. Bioinformatics, 37, 2818–2824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. He L.  et al. (2019) Functions of N6-methyladenosine and its role in cancer. Mol. Cancer, 18, 176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kosuru R., Chrzanowska M. (2020) Integration of Rap1 and calcium signaling. IJMS, 21, 1616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Lan Q.  et al. (2019) The critical role of RNA m6A methylation in cancer. Cancer Res., 79, 1285–1292. [DOI] [PubMed] [Google Scholar]
  17. Lerer I.  et al. (2005) Deletion of the ANKRD15 gene at 9p24.3 causes parent-of-origin-dependent inheritance of familial cerebral palsy. Hum. Mol. Genet., 14, 3911–3920. [DOI] [PubMed] [Google Scholar]
  18. Levy-Lahad E.  et al. (1995) Candidate gene for the chromosome 1 familial Alzheimer’s disease locus. Science, 269, 973–977. [DOI] [PubMed] [Google Scholar]
  19. Li Z., Wu H. (2019) Toast: improving reference-free cell composition estimation by cross-cell type differentia l analysis. Genome Biol., 20, 190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Li Z.  et al. (2019) Dissecting differential signals in high-throughput data from complex tissues. Bioinformatics, 35, 3898–3905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Liao W.  et al. (2018) Dual specificity phosphatase 6 protects neural stem cells from β-amyloid-induced cytotoxicity through erk1/2 inactivation. Biomolecules, 8, 181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lin X.  et al. (2019) RNA m6A methylation regulates the epithelial mesenchymal transition of cancer cells and translation of snail. Nat. Commun., 10, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  23. Linden R.  et al. (2008) Physiology of the prion protein. Physiol. Rev., 88, 673–728. [DOI] [PubMed] [Google Scholar]
  24. Liu J.  et al. (2014) A METTL3–METTL14 complex mediates mammalian nuclear RNA N6-adenosine methylation. Nat. Chem. Biol., 10, 93–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Liu J.  et al. (2020) N6-methyladenosine of chromosome-associated regulatory RNA regulates chromatin state and transcription. Science, 367, 580–586. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Liu L.  et al. (2017) QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model. BMC Bioinformatics, 18, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Love M.I.  et al. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with deseq2. Genome Biol., 15, 550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Marathe S.  et al. (2017) Jagged1 is altered in Alzheimer’s disease and regulates spatial memory processing. Front. Cell. Neurosci., 11, 220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Meng J.  et al. (2013) Exome-based analysis for RNA epigenome sequencing data. Bioinformatics, 29, 1565–1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Park P.J. (2009) Chip-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet., 10, 669–680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Peng S.  et al. (2004) Increased proNGF levels in subjects with mild cognitive impairment and mild Alzheimer disease. J. Neuropathol. Exp. Neurol., 63, 641–649. [DOI] [PubMed] [Google Scholar]
  32. Poh,H.X.  et al. (2022) Alternative splicing of METTL3 explains apparently METTL3-independent m6A modifications in mRNA. PLoS Biol., 20, e3001683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Roundtree I.A.  et al. (2017) Dynamic RNA modifications in gene expression regulation. Cell, 169, 1187–1200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Shafik A.M.  et al. (2021) N6-methyladenosine dynamics in neurodevelopment and aging, and its potential role in Alzheimer’s disease. Genome Biol., 22, 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Sherrington R.  et al. (1995) Cloning of a gene bearing missense mutations in early-onset familial Alzheimer’s disease. Nature, 375, 754–760. [DOI] [PubMed] [Google Scholar]
  36. Taylor J., Tibshirani R.J. (2015) Statistical learning and selective inference. Proc. Natl. Acad. Sci. USA, 112, 7629–7634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Thorvaldsdóttir H.  et al. (2013) Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform., 14, 178–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Uemura K.  et al. (2011) Reciprocal relationship between app positioning relative to the membrane and PS1 conformation. Mol. Neurodegen., 6, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Zhang Z.  et al. (2019) Radar: differential analysis of meRIP-seq data with a random effect model. Genome Biol., 20, 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Zhou W.  et al. ; DDD study. (2019) Loss of function of NCOR1 and NCOR2 impairs memory through a novel GABAergic hypothalamus–CA3 projection. Nat. Neurosci., 22, 205–217. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btac601_Supplementary_Data

Data Availability Statement

The two MeRIP-seq datasets used in this manuscript are publicly available. The Young mouse data are available at GEO under accession code GSE144032. The HeLa data are available at GEO under accession code GSE46705. The R/Bioconductor package T RESS is freely available at https://bioconductor.org/packages/devel/T RESS.


Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES