MntJULiP and Jutils: differential splicing analysis of RNA-seq data with covariates

Wui Wang Lui; Guangyu Yang; Zitong He; Liliana Florea

doi:10.1093/nargab/lqaf140

. 2025 Nov 3;7(4):lqaf140. doi: 10.1093/nargab/lqaf140

MntJULiP and Jutils: differential splicing analysis of RNA-seq data with covariates

Wui Wang Lui ¹, Guangyu Yang ^2,³, Zitong He ⁴, Liliana Florea ^5,^6,^✉

PMCID: PMC12582023 PMID: 41189648

Abstract

Emerging large and complex RNA-seq datasets from disease and population studies include multiple confounders such as sex, age, ethnicity, and clinical attributes, which demand highly specialized data analysis tools. However, current methods are generally not equipped to handle the new challenges. We describe an extension of our programs MntJULiP and Jutils for differential splicing detection and visualization from RNA-seq data that accounts for covariates. MntJULiP detects intron-level differences in both splicing ratios and splicing abundance from RNA-seq data using a Bayesian linear mixture model adjusted for covariates. Jutils visualizes alternative variation with heatmaps, sashimi plots, Venn diagrams, and, reported here, PCA maps. With covariate modeling, MntJULiP drastically reduces false positives to achieve very high precision (>90%), significantly outperforming competitors. We applied the methods to GTEx brain RNA-seq samples to deconvolute the effects of sex and age at death on the splicing patterns. In particular, analyses of frontal cortex data reveal a pattern of increased splicing differences with more distant age groups, while clustering of covariate-adjusted data identifies a subgroup of individuals undergoing a distinct splicing program over the age span.

Introduction

Differences in alternative splicing patterns are responsible for the diversity of proteins across tissues, cell types, and developmental stages, and disruptions in normal RNA splicing patterns have been reported in a number of diseases [1, 2]. Increasingly large and complex RNA-seq datasets that include multiple confounders, such as sex, age, ethnicity, and clinical attributes, are emerging from disease study cohorts and population-level projects, which demand highly specialized analysis tools. While multiple methods exist to detect differences in splicing, including LeafCutter, MntJULiP, rMATS, SUPPA2, DRIMSeq, DEXSeq, and DARTS [3–9], few are equipped to handle the complexities of the data. In particular, there is a scarcity of programs that can rigorously account for the effect of confounding attributes on the observed data. Additionally, visualization tools are critical for enabling biologists to quickly and intuitively interpret such differences, identifying patterns genome-wide or at the level of the individual genes. We previously developed two tools, MntJULiP [4] and Jutils [10], for differential splicing detection and visualization, respectively, that can efficiently handle large-scale and complex RNA-seq data collections. We report a recent implementation of these programs to account for covariates and to produce new customizable visualizations of covariate-adjusted data as PCA plots. We demonstrate their accuracy on simulated data. We then illustrate their applicability and usefulness by analyzing RNA-seq data from brain tissue obtained from the GTEx repository [11], which paints the global splicing landscape across age and biological sex groups and identifies a potential distinct splicing program in a subset of individuals.

Materials and methods

Overview of the MntJULiP tool

MntJULiP [4] detects differences in splicing at the intron level, which greatly improves performance when transcript reconstructions are inaccurate or incomplete. It identifies events directly from the alignment data, without relying on a reference gene annotation, and therefore can find and report novel events. MntJULiP can detect both differences in the introns’ splicing ratios (DSR), and changes in the abundance level of introns (DSA), and thus can capture alternative splicing variations in a comprehensive way. For DSA, it considers each intron individually and models read counts with a zero-inflated negative binomial (ZINB) distribution. For DSR, it groups introns sharing a splice site into a “bunch” and uses a Dirichlet multinomial (DM) distribution to simultaneously model all introns in a group (Supplementary Fig. S1). Additionally, MntJULiP has the ability to perform multiple comparisons simultaneously, which we showed was more accurate at capturing global differences in a time series or complex experiments [4]. We modified the Bayesian mixture models of MntJULiP to account for linear effects of covariates as described below.

Covariate-augmented Bayesian models

We introduce covariate effects as linear components in the MntJULiP Bayesian models to account for extraneous attributes that could bias the analysis. Numerical covariates, such as age and weight, are centered and scaled, while categorical covariates, such as biological sex and ethnicity, are encoded into a numerical representation (0, 1, … K-1). Using the covariate-augmented models, we generate new adjusted counts and PSI values per sample, which can be used to generate global views of the alternative splicing variation as heatmaps or PCA plots using Jutils or other visualization tools.

The differential splicing abundance (DSA) model

This model tests an intron for differences in abundance among K conditions. Let N be the number of samples, K conditions, and P covariates (including the comparison). The covariate-augmented read count y of intron v in sample i follows a zero-inflated negative binomial distribution ZINB(μ_k + x_iβ_k+ a_k, θ) with mean μ, K coefficient column vectors β_k of length P, sample intercept a_k, and N covariate row vectors x_i of length P. We re-fit the prior on the sample mean, previously introduced to capture the variances across different conditions and within individual samples, to a normal distribution: Inline graphic ∼ N[, sqrt()], and the prior on the dispersion parameter to an inverse Half Cauchy (HC) distribution: φ⁻¹ ∼ sqrt[HC(0, 1)]. The zero-inflated enhanced negative binomial (ZINB) Bayesian model then is

Maximum likelihood estimation is performed with respect to parameters µ, β, a, and θ, separately for the null and alternative models (i.e. samples generated from a single-condition and from a K-condition experiment, respectively) to obtain log likelihoods L(θ₀) and L(θ₁) for testing [4].

The differential splicing ratio (DSR) model

The model tests the competing introns within a “bunch” for differences in splicing ratios among conditions. The covariate-augmented read counts y_i1, y_i2, …, y_iM in sample i for a “bunch” with M introns follow a Dirichlet multinomial distribution with concentration parameters α_i1, α_i2, …, α_iM, the M coefficient row vectors β_m of length P, the intercepts a_m, N covariate column vectors x_i of length P, and “bunch” total n_iB = Σ_m′y_im′ [3]:

Maximum likelihood estimation is performed with respect to parameters α, β, and a, separately for the null and alternative models (excluding and including the condition column x, respectively) to obtain the log likelihoods L(θ₀) and L(θ₁) for testing [4].

Covariate-adjusted PSI and abundance estimates

Using the covariate-augmented models above, we generate new adjusted counts and Percent Spliced In (PSI) values per sample. After regressing out confounders, we obtain residuals r, applying a weakly informative prior r ∼ N(0, 10). For abundance estimates, r is estimated for the alternative model with the optimized μ, β, a, and θ:

For splicing ratios (PSI) estimates, r is estimated for the alternative model using the optimized α, β, and a [1]:

The negative binomial and the Dirichlet multinomial models are implemented in PyStan, a Python package for Bayesian inference. Estimated counts were added to the output “intron_data.txt” file, and estimated PSI values were reported in a new file “group_data.txt,” complementing the existing raw count values.

Differential splicing visualization with Jutils

Jutils [10] is a Python package for visualization of differential splicing events in the form of heatmaps, sashimi plots, and Venn diagrams of sets of differentially spliced genes. Jutils can be used with virtually any differential splicing detection tool and is specifically configured to work with the output of popular methods, including LeafCutter, MntJULiP, and rMATS. It converts the output of each tool into an intermediate TSV-formatted file to render the data in a unified format, described in [10]. This file is then sufficient to create visualizations, making it lightweight and very well suited for collaborations.

We extend Jutils and introduce new visualizations of PCA plots to identify potential unknown relationships among samples and to observe the effect of covariates on the data. PCA plots are generated from the splicing ratios (PSI) for DSR methods, or read counts for DSA methods, contained in the TSV file and imported from the output of differential splicing programs. As customizable features, input data can be filtered by significance, and data points can be differentiated by color, shape, and label based on conditions or covariates.

Validation of covariate models on simulated data

To validate the covariate models, we simulated data for one “condition,” with values “control,” “disease,” and “stage2,” with one covariate, “biological sex,” with values “M” and “F” as described in [4]. Specifically, changes were simulated in the expression (DE) and/or the splicing ratio (DS) of genes. Changes in expression (DE) were simulated by either halving or doubling the expression level of the gene. Changes in splicing ratios (DS) were simulated by swapping the expression levels of the gene’s top two transcript isoforms.

For the pairwise comparisons, differences due to “condition” between the “control” and the “disease” states were simulated at 600 genes, including 200 DE, 200 DS, and 200 DE + DS genes. Differences in “biological sex” (covariate) were represented as changes in 300 genes, 100 from each of the DS, DE, and DE + DS categories. Consequently, the target gene set for the DSR pairwise comparison consists of the pooled 200 DS and 200 DS + DE genes differentially spliced between the “control” and “disease” states, while for the DSA pairwise comparison the target gene set is the set of 600 modified genes, 200 for each of the DS, DE, and DS + DE categories (see Supplementary Table S1).

For the three-way comparisons, changes at 100 of the previously modified genes were maintained, and additional changes between “disease” and “stage2” were made to a set of 200 additional genes not encountered previously, for each of the DE, DS, and DE + DS categories. Therefore, for the DSR three-way comparison, the target represents the 800 genes simulated as being DS or DE + DS between any of the “control,” “disease,” and “stage2” categories, while for the multi-way DSA comparison, the target is the full set of 1200 genes (400 DE, 400 DS, and 400 DE + DS) simulated to have changed between any of the “control,” “disease,” and “stage2” states (see Supplementary Table S2).

Comparative program evaluation

Simulated reads were aligned to the human genome sequence GRCh38 using STAR v2.7.10a [12] and were analyzed for alternative splicing detection using the programs MntJULiP v.1.15.2, LeafCutter v.0.2.9 [3], DRIMSeq v.1.30.0 [8], and DEXSeq v.1.52.0 [9] for DSR, and with the differential gene expression tool DESeq2 v.1.42.1 [13] with p-val ≤ 0.05 and q-val ≤ 0.05 cutoffs for DSA, with and without accounting for covariates. For DRIMSeq, we used the “batch” variable to account for the “biological sex” covariate. MntJULiP, LeafCutter, DRIMSeq, and DESeq2 use “introns” as features in the comparisons, whereas DEXSeq is based on exons. Additionally, for DRIMSeq and DESeq2, we used as input the set of introns curated by MntJULiP’s intron extraction tool, “junc.” For each method, the set of genes with predicted events was used in the evaluation. Lastly, we use the term “accuracy” to refer to the overall correctness of the methods and employ conventional measures [sensitivity, Sn = TP/(TP + FN); precision, Pr = TP/(TP + FP), and the F-value, F = 2*Sn*Pr/(Sn + Pr)] to assess it.

Analyses of GTEx data

Alignments of GTEx RNA-seq reads were those reported in [4], produced by HISAT2 [14] with RefSeq transcript annotations. Gene functional classification, and Gene Ontology (GO) and pathway enrichment analyses were performed with the tools DAVID [15] and Metascape [16], using a False Discovery Rate (FDR) < 0.1 for significance.

Results

Method evaluation on control data

To validate the covariate models, we simulated data for one “condition” variable, with values “control,” “disease,” and “stage2,” with one covariate, “biological sex,” with values “M” and ‘F.” For each condition, we generated 10 samples in each of the categories “condition” x “biological sex.” Starting from an empirical transcript expression matrix trained on an RNA-seq data set from lung fibroblasts (GenBank A# SRR493366) and using GENCODE v.41 as reference, we generated 11.5 million 100 bp long paired-end reads per sample from 2000 genes with two or more expressed isoforms. Changes in splicing were simulated as described in [4], separately for the DSR and the DSA models; detailed descriptions are provided in the “Materials and methods” section.

We evaluated the performance of MntJULiP in splicing ratio (DSR) pairwise comparison on the simulated data, with and without covariates, alongside LeafCutter, DRIMSeq, and DEXSeq, which are the only other programs to implement covariates, in an imbalanced comparison of (8M, 2F) “control” versus (8F, 2M) “disease” samples (Fig. 1A). MntJULiP drastically reduced the false positives due to covariate bias while achieving sensitivity comparable to the original implementation, thus improving the accuracy. MntJULiP significantly outperforms all other programs as measured by the F-value, 0.744, followed by LeafCutter, 0.680, achieving the highest precision (0.945) and sensitivity comparable to the top program (0.613 versus DEXSeq’s 0.628). Further, the PCA plots of the estimated PSI values indicate that variation due to “biological sex” was correctly removed (Fig. 1B).

Figure 1. — Evaluation of MntJULiP’s covariate function for pairwise comparison on simulated data. (A) MntJULiP was evaluated alongside LeafCutter, DRIMSeq, and DEXSeq for DSR comparisons (top), and DESeq2 (with p-val ≤ 0.05 and q-val ≤ 0.05 cutoffs) for DSA comparisons (bottom). On the left, performance was evaluated and shown using standard measures: Sn = TP/(TP + FN), Pr = TP/(TP + FP), and F-val = 2*Sn*Pr/(Sn + Pr). The breakdown of false positives (FPs) by covariate-related versus extrinsic factors is shown on the right. (B) PCA plots of samples generated with Jutils based on estimated PSI values, before and after covariate treatment. Top: DSR comparison; bottom: DSA comparison. Legend: circles, male (“M”) samples; inverted triangles, female (“F”) samples; orange, “control”; and blue, “disease.” Including “biological sex” as covariate in the models removes dependency along PC2 in the DSR comparison, while for the DSA comparison separation among “M” and “F” samples from the same category (“control,” “disease”) is drastically decreased.

Similarly, MntJULiP outperforms DESeq2 in the DSA pairwise comparisons in overall accuracy, with an F-value of 0.886 compared to 0.843 and 0.727, respectively, for the two DESeq2 options. Once again, MntJULiP has the highest precision, 0.913, and near-best sensitivity, 0.860 versus 0.875 for DESeq2. While not entirely eliminating them, both programs significantly reduce the number of covariate-driven (and other) false positives (Fig. 1A and B). Similar results were obtained for the DSR and the DSA multiway comparisons (Supplementary Fig. S2), with MntJULiP consistently showing the highest precision and overall accuracy. (Note that LeafCutter does not allow for multi-way comparisons, whereas DRIMSeq allows for only one covariate, “batch.”) Therefore, MntJULiP models and removes biases due to covariates from the RNA-seq data to consistently achieve high accuracy, especially precision, and is more accurate than its competitors.

Deconvoluting the effects of covariates on human frontal cortex splicing from GTEx RNA-seq data

To illustrate, we applied our methods to 1398 GTEx RNA-seq samples from 13 brain regions in three comparisons. The first comparison, among regions, revealed distinct groupings between the cerebellar, cortex, and basal ganglia regions, which did not change when accounting for the covariates “biological sex” and “age at death,” as was expected (Supplementary Fig. S3).

Second, we compared the 120 frontal cortex RNA-seq samples by age groups (“20s,” …, “70s”). Changes in the frontal cortex in aging contribute to sex-specific differences in the prevalence of neurological disorders [17]. More differences were observed with more distant age groups, consistent with reports of splicing deregulation with aging [18] (Supplementary Fig. S4). As an outlier, the higher number of events between the “20s” and “30s” groups may be due to the small number of samples in those categories (3 and 4, respectively). When “biological sex” was used as a covariate, a significant increase in differences was observed between the “20s” and “40s” groups, indicating a possible mark of sex-specific differentiation. As a mark of robustness, similar trends were observed when using different alignment tools, with and without using reference gene annotations in alignments; notably and expectedly, the use of annotation at the alignment stage increased the number of spliced alignments and, as a consequence, the number of predictions (Supplementary Fig. S5). Further investigating the “20s” versus “40s” comparison, accounting for the covariate increased the number of differentially spliced genes (931 versus 760 without covariate) and revealed additional disease, gene ontology, and pathway categories of enrichment (Supplementary Fig. S6). Specifically, while both comparisons identified “Chemdependency/Tobacco use disorder” as a significantly enriched disease class, “Neurogenesis” and “Exocytosis” as biological processes, “Kinase,” “Guanine nucleotide releasing factor,” “Transferase,” and “Actin binding” as molecular functions, and “Protein–protein interaction at synapses” and the brain-specific “Splicing factor NOVA regulated synaptic proteins” pathways, the covariate-aware comparison identified additional categories, including “Endocytosis,” “Insulin secretion,” “Signalling by RHO GTPases,” “VEGFA-VEGFR2 pathway,” as well as a class of 45 “RNA binding” genes, including several spliceosome and splicing regulatory proteins (CASC3, RP9, HNRNPDL, HRRNPUL1, PRPF3, PRPF38B, SRSF4, and SNRNPN) (Supplementary Fig. S7), suggesting differences in the splicing machinery between the two age groups. Further, it identified a group of seven proteins (DNM3, PRKCB, PLCB1, PRKACB, SLC8A1, AP2M1, and DNM1) in the enriched “Endocrine and other factor-regulated calcium reabsorption” KEGG pathway. Lastly, the fact that both comparisons point to “Chemdependency/Tobacco use disorder” as enriched category indicates the need to account for additional covariates, such as smoking status, to further reduce confounders.

Lastly, we compared the 83 male (“M”) and 37 female (“F”) frontal cortex samples to identify sex-specific differences in splicing. Jutils heatmaps of PSI values revealed a subgroup of 29 samples (13 “F” and 16 “M”) with a distinct alternative splicing pattern, which became evident when regressing for “age at death.” The 81 genes with distinguishing splicing patterns were enriched in categories including endocytosis, membrane trafficking, vesicle-mediated transport, adherens junctions, and brain-derived neurotrophic factor (BDNF) signaling, with broad roles in neuronal survival, differentiation, synaptic plasticity, cellular transport and communication, and tissue architecture. The subgroup over-represented females (13F:16M compared to 37F:83M for the entire set) and pointed to a probable distinct splicing program in a subset of individuals with aging (Fig. 2 and Supplementary Fig. S8). Similar DSA analyses of the frontal cortex data revealed distinguishing events, including at the non-coding X Inactive Specific Transcript (XIST) gene, which is involved in X chromosome inactivation in female early development processes, both without and with covariate modeling, thus supporting the ability of the program to identify and retain relevant genes (Supplementary Fig. S9). Therefore, as previously noted [4], DSR and DSA reflect different and complementary views and effects of alternative splicing on the transcriptional and functional outcomes.

Figure 2. — Differential splicing analyses of GTEx brain RNA-seq data with MntJULiP and Jutils. Jutils heatmaps of DSR events from the comparison between the frontal cortex female (“F”) and male (“M”) sample groups, without (left) and with (right) “age at death” as covariate. PSI intron values estimated with MntJULiP and uploaded into Jutils were plotted, with rows (events) and columns (samples) clustered using the “weighted” method and “cityblock” similarity metric. The covariate-adjusted heatmap shows a distinct supercluster marked with boxes.

To further validate and understand the effects of incorporating covariates, we functionally analyzed the sets of differentially spliced genes from the male-female comparisons above. We used Metascape, a platform for functional annotation of gene lists that combines over 40 independent knowledgebases, to comparatively analyze functional categories between the no-covariate comparison and when accounting for “age at death” as a confounding factor. For DSR, the age-covariate comparison increased the number of differential splicing events reported from 165 to 282, and the number of genes from 148 to 221 (Supplementary Fig. S10A), thus revealing classes of genes with splicing differences previously obscured by age imbalances between the cohorts. Metascape analysis identified gains in categories associated with sex differences in frontal cortex, including “Membrane organization,” “Brain-derived neurotrophic factor (BDNF) signaling,” “Regulation of synapses,” and “Endocytosis” (Supplementary Fig. S10B and references therein). Intriguingly, the only category to show a decrease in significance when age at death was used as covariate was regulation of messenger RNA (mRNA) splicing via the spliceosome, indicating that splicing differences at these genes could be more likely explained by differences in the “age” distribution between the male and female groups and, by extension, by the general aging process. This finding is consistent with previous reports implicating mRNA splicing, the spliceosome, and splicing regulatory factors in the aging process [18, 19]. It is further in alignment with our earlier finding of “RNA binding” as an enriched category in the “20s-versus-40s” age group comparisons when regressing out effects of biological sex (Supplementary Fig. S7).

Unlike for DSR above, the DSA covariate-adjusted comparison reduced the number of events identified from 90 to 32, and the number of genes from 69 to 22, indicating that the program likely removed genes whose differences in splicing could be accounted for by “age” differences between the male and female groups (Supplementary Fig. S11A). Validating our approach, the only category to show an increase in significance when regressing out donor age is “Extracellular matrix organization,” a potential factor of sex-based dimorphism in brain [20]. Conversely, several categories that were significant in both comparisons, including “Brain development” and “Cellular response to organic cyclic compound,” showed reduced significance after normalizing for “age.” Lastly, several categories, including “Interferon gamma signaling,” “Learning,” “Pathways of neurodegeneration,” and “Axon development,” appeared significant only in the no-covariate comparison. These clusters, related to inflammation, immune response, and neurodegeneration, particularly in Alzheimer’s disease, present sex-based differences that, however, occur specifically during the aging process, and therefore were correctly removed or reduced (Supplementary Fig. S11B).

Discussion and conclusions

Confounders such as sex, age, and other biomedical attributes inherent to RNA-seq datasets from disease and population studies can bias bioinformatics analyses. Nevertheless, tools that can effectively account for their effects are lagging. The R-based tools DRIMSeq and DESeq2 implement sophisticated statistical models, but their accuracy depends on the quality of the input read count matrix, and running them requires specialized expertise. Additionally, DRIMSeq can only model one covariate (“batch”). Currently, LeafCutter is the only other specialized end-to-end method for RNA-seq data, from intron selection to differential splicing testing, that implements covariates; however, it is limited to DSR and to pairwise comparisons. MntJULiP and Jutils are user-friendly command-line tools for end-to-end differential splicing detection and visualization that implement robust intron selection, comprehensively model DSA and DSR differences while accounting for covariate effects, and enable multi-way comparisons. As with all other differential splicing detection methods, limitations include high variability with small numbers of samples and for events with low read counts. In our analyses, MntJULiP effectively removed covariate effects from both simulated and real data to uncover patterns of splicing variation, while being consistently more accurate than its competitors. In particular, analyses of GTEx RNA-seq data from the frontal cortex helped identify patterns of differential splicing across age groups and pointed to a distinct splicing program in a subgroup of individuals. To conclude, MntJULiP and Jutils are highly effective and efficient analysis tools for large-scale complex RNA-seq datasets with confounding factors, and can reveal new insights into disease and population.

Supplementary Material

lqaf140_Supplemental_File

lqaf140_supplemental_file.pdf^{(4.7MB, pdf)}

Acknowledgements

Computations were performed on the Advanced Research Computing at Hopkins (ARCH) facility supported by the National Science Foundation [OAC 1920103]. We thank two anonymous reviewers for their suggestions to improve our manuscript. This manuscript is the result of funding in whole or in part by the National Institutes of Health (NIH). It is subject to the NIH Public Access Policy. Through acceptance of this federal funding, NIH has been given a right to make this manuscript publicly available in PubMed Central upon the Official Date of Publication, as defined by NIH.

Author contributions: Wui Wang Lui (Conceptualization [equal], Formal analysis [equal], Methodology [lead], Software [lead], Validation [lead], Visualization [lead], Writing – original draft [equal], Writing – review & editing [supporting]), Guangyu Yang (Conceptualization [equal], Methodology [equal], Software [equal], Writing – original draft [supporting], Writing – review & editing [equal]), Zitong He (Formal analysis [supporting], Methodology [supporting], Software [supporting], Writing – review & editing [supporting]), and Liliana Florea (Conceptualization [lead], Formal analysis [equal], Funding acquisition [lead], Methodology [supporting], Resources [lead], Supervision [lead], Validation [equal], Writing – original draft [lead], Writing – review & editing [lead])

Contributor Information

Wui Wang Lui, Department of Computer Science, Johns Hopkins University, Baltimore, MD 21205, United States.

Guangyu Yang, Department of Computer Science, Johns Hopkins University, Baltimore, MD 21205, United States; META, 101 Burlingame Ave, Burlingame, CA 94010, United States.

Zitong He, Department of Computer Science, Johns Hopkins University, Baltimore, MD 21205, United States.

Liliana Florea, Department of Computer Science, Johns Hopkins University, Baltimore, MD 21205, United States; Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD 21205, United States.

Supplementary data

Supplementary data is available at NAR Genomics & Bioinformatics online.

Conflict of interest

None declared.

Funding

This work was supported by the National Institutes of Health [R01GM129085 and R35GM156374 to L.F.]. Funding to pay the Open Access publication charges for this article was provided by the National Institutes of Health [R35GM156374].

Data availability

The tools MntJULiP and Jutils are available from https://github.com/splicebox/MntJULiP and https://github.com/splicebox/Jutils, respectively. Archived versions of the software, scripts, alignment files of simulated data, and results from applying the tools to analysis data, including gene lists, are available from Zenodo (DOIs: 10.5281/zenodo.15875405 and 10.5281/zenodo.14984116).

References

1. Baralle FE, Giudice J. Alternative splicing as a regulator of development and tissue identity. Nat Rev Mol Cell Biol. 2017;18:437–51. 10.1038/nrm.2017.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Zhang Y, Qian J, Gu C et al. Alternative splicing and cancer: a systematic review. Signal Transduct Target Ther. 2021;6:78. 10.1038/s41392-021-00486-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Li YI, Knowles DA, Humphrey J et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat Genet. 2018;50:151–8. 10.1038/s41588-017-0004-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Yang G, Sabunciyan S, Florea L. Comprehensive and scalable quantification of splicing differences with MntJULiP. Genome Biol. 2022;23:195. 10.1186/s13059-022-02767-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Shen S, Park JW, Lu ZX et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc Natl Acad Sci USA. 2014;111:E5593–5601. 10.1073/pnas.1419161111. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Trincado JL, Entizne JC, Hysenaj G et al. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 2018;19:40. 10.1186/s13059-018-1417-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Zhang Z, Pan Z, Ying Y et al. Deep-learning augmented RNA-seq analysis of transcript splicing. Nat Methods. 2019;16:307–10. 10.1038/s41592-019-0351-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Nowicka M, Robinson MD. DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Res. 2016;5:1356. 10.12688/f1000research.8900.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Anders S, Reyes A, Huber W. Detecting differential usage of exons from RNA-seq data. Genome Res. 2012;22:2008–17. 10.1101/gr.133744.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Yang G, Cope L, He Z et al. Jutils: a visualization toolkit for differential alternative splicing events. Bioinformatics. 2021;37:4272–4. 10.1093/bioinformatics/btab401. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Consortium TGTE. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45:580–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Dobin A, Davis CA, Schlesinger F et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106. 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Kim D, Paggi JM, Park C et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15. 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Sherman BT, Hao M, Qiu J et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022;50:W216–21. 10.1093/nar/gkac194. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Zhou Y, Zhou B, Pache L et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10:1523. 10.1038/s41467-019-09234-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Zhou X, Cao J, Zhu L et al. Molecular differences in brain regional vulnerability to aging between males and females. Front Aging Neurosci. 2023;15:1153251. 10.3389/fnagi.2023.1153251. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Harries LW, Hernandez D, Henley W et al. Human aging is characterized by focused changes in gene expression and deregulation of alternative splicing. Aging Cell. 2011;10:868–78. 10.1111/j.1474-9726.2011.00726.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Mazin P, Xiong J, Liu X et al. Widespread splicing changes in human brain development and aging. Mol Syst Biol. 2013;9:633. 10.1038/msb.2012.67. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Batzdorf CS, Morr AS, Bertalan G et al. Sexual dimorphism in extracellular matrix composition and viscoelasticity of the healthy and inflamed mouse brain. Biology (Basel). 2022;11:230. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

lqaf140_Supplemental_File

lqaf140_supplemental_file.pdf^{(4.7MB, pdf)}

Data Availability Statement

[B1] 1. Baralle FE, Giudice J. Alternative splicing as a regulator of development and tissue identity. Nat Rev Mol Cell Biol. 2017;18:437–51. 10.1038/nrm.2017.27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2. Zhang Y, Qian J, Gu C et al. Alternative splicing and cancer: a systematic review. Signal Transduct Target Ther. 2021;6:78. 10.1038/s41392-021-00486-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3. Li YI, Knowles DA, Humphrey J et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat Genet. 2018;50:151–8. 10.1038/s41588-017-0004-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4. Yang G, Sabunciyan S, Florea L. Comprehensive and scalable quantification of splicing differences with MntJULiP. Genome Biol. 2022;23:195. 10.1186/s13059-022-02767-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5. Shen S, Park JW, Lu ZX et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-seq data. Proc Natl Acad Sci USA. 2014;111:E5593–5601. 10.1073/pnas.1419161111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6. Trincado JL, Entizne JC, Hysenaj G et al. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 2018;19:40. 10.1186/s13059-018-1417-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7. Zhang Z, Pan Z, Ying Y et al. Deep-learning augmented RNA-seq analysis of transcript splicing. Nat Methods. 2019;16:307–10. 10.1038/s41592-019-0351-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8. Nowicka M, Robinson MD. DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Res. 2016;5:1356. 10.12688/f1000research.8900.2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9. Anders S, Reyes A, Huber W. Detecting differential usage of exons from RNA-seq data. Genome Res. 2012;22:2008–17. 10.1101/gr.133744.111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10. Yang G, Cope L, He Z et al. Jutils: a visualization toolkit for differential alternative splicing events. Bioinformatics. 2021;37:4272–4. 10.1093/bioinformatics/btab401. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11. Consortium TGTE. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45:580–5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12. Dobin A, Davis CA, Schlesinger F et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106. 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14. Kim D, Paggi JM, Park C et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15. 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15. Sherman BT, Hao M, Qiu J et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022;50:W216–21. 10.1093/nar/gkac194. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16. Zhou Y, Zhou B, Pache L et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10:1523. 10.1038/s41467-019-09234-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17. Zhou X, Cao J, Zhu L et al. Molecular differences in brain regional vulnerability to aging between males and females. Front Aging Neurosci. 2023;15:1153251. 10.3389/fnagi.2023.1153251. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18. Harries LW, Hernandez D, Henley W et al. Human aging is characterized by focused changes in gene expression and deregulation of alternative splicing. Aging Cell. 2011;10:868–78. 10.1111/j.1474-9726.2011.00726.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19. Mazin P, Xiong J, Liu X et al. Widespread splicing changes in human brain development and aging. Mol Syst Biol. 2013;9:633. 10.1038/msb.2012.67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20. Batzdorf CS, Morr AS, Bertalan G et al. Sexual dimorphism in extracellular matrix composition and viscoelasticity of the healthy and inflamed mouse brain. Biology (Basel). 2022;11:230. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

MntJULiP and Jutils: differential splicing analysis of RNA-seq data with covariates

Wui Wang Lui

Guangyu Yang

Zitong He

Liliana Florea

Roles

Abstract

Introduction

Materials and methods

Overview of the MntJULiP tool

Covariate-augmented Bayesian models

The differential splicing abundance (DSA) model

The differential splicing ratio (DSR) model

Covariate-adjusted PSI and abundance estimates

Differential splicing visualization with Jutils

Validation of covariate models on simulated data

Comparative program evaluation

Analyses of GTEx data

Results

Method evaluation on control data

Figure 1.

Deconvoluting the effects of covariates on human frontal cortex splicing from GTEx RNA-seq data

Figure 2.

Discussion and conclusions

Supplementary Material

Acknowledgements

Contributor Information

Supplementary data

Conflict of interest

Funding

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases