Skip to main content
F1000Research logoLink to F1000Research
. 2025 May 30;14:47. Originally published 2025 Jan 8. [Version 2] doi: 10.12688/f1000research.155223.2

Selecting differential splicing methods: Practical considerations for short-read RNA sequencing

Ben J Draper 1,a, Mark J Dunning 2, David C James 1
PMCID: PMC12308171  PMID: 40741371

Version Changes

Revised. Amendments from Version 1

A few clarifications and edits have been made: 1) Focus on short-read RNASeq and reduction in discussion of long-read technology. 2) Clearer conclusions drawn from benchmark reviews and more informative recommendations. 3) Clarification for tools based on developer/community support, programming platform and accessibility.

Abstract

Alternative splicing is crucial in gene regulation, with significant implications in clinical settings and biotechnology. This review article compiles bioinformatics short-read RNA-seq tools for investigating differential splicing; offering a detailed examination of their statistical methods, case applications, and benefits. A total of 22 tools are categorised by their statistical family (parametric, non-parametric, and probabilistic) and level of analysis (transcript, exon, and event). The central challenges in quantifying alternative splicing include correct splice site identification and accurate isoform deconvolution of transcripts. Benchmarking studies show no consensus on tool performance, revealing considerable variability across different scenarios. Tools with high citation frequency and continued developer maintenance, such as DEXSeq and rMATS, are recommended for prospective researchers. To aid in tool selection, a guide schematic is proposed based on variations in data input and the required level of analysis. Emerging long-read RNA sequencing technologies are discussed as a complement to short-read methods, promising reduced deconvolution needs and further innovation.

Keywords: Bioinformatics, Alternative Splicing, RNASeq, Transcriptomics, Differential Expression

Introduction

Alternative splicing (AS) can be best described as fine-tuning gene expression by rearranging exons and introns in pre-mRNA. With 90-95% of human multi-exon genes estimated to possess some form of AS, it is a widespread regulatory process in cellular biology. 1 The cell utilises a large ribonucleoprotein (RBP) complex known as the spliceosome which is guided to target sites through the interaction of sequence elements (splice sites, enhancers & silencers and the polypyrimidine tract) and/or splicing factors. Pre-mRNA splicing can also occur without the splicesome as in the case of self-splicing group I & II introns, tRNA splicing and trans-splicing. 2 This ultimately results in genome-wide transcript diversity and subsequently, measurable changes to protein functionality.

Previous research has uncovered the phenotypic consequences of AS in disease. In humans, clinical research has shown AS as a key instigator in several forms of cancer and neurodegenerative disorders. 35 One notable discovery in Microtubule-associated protein tau’s (MAPT) possession of mis-spliced isoforms causing abnormal TAU accumulation progressing to Alzheimer’s disease. 6 In cancer, numerous mis-spliced variants of tumour suppressors, apoptotic and angiogenic proteins have been discovered to contribute to tumour progression. 7, 8 Within the context of aging, the “energy-splicing axis hypothesis” further underscores AS’s prominent role in controlling phenotype. 9 Beyond clinical research, the utility of alternative transcripts for bioengineering purposes has been explored. For example, an alternatively spliced version of the transcription factor X-box binding protein 1 (XBP1) coexpressed in production cell lines has been shown to increase productivity in the biomanufacturing of recombinant proteins. 1012 In bio-agriculture, the CRISPR-mediated directed evolution of SF3B1 mutants (a spliceosomal component) in rice has improved crop traits through better resistance to splicing inhibitors. 13 Increasingly, the value of AS in both clinical and biotechnology applications has been recognised; highlighting the need for robust bioinformatics pipelines to identify variants.

For prospective researchers investigating AS, the transcriptomic data is typically generated using next-generation sequencing. Short-read RNAseq is the most commonly used experimental technique to interrogate a transcriptome owing to its versatility and cost-effectiveness. 14, 15 It involves sequencing short fragments of RNA molecules, providing insights into the respective expression levels of genomic features assembled from reference genomes. These features may be coding sequences, genes, transcripts, exons, introns, codons or even untranslated regions. Notably, the term “gene” refers to the DNA template, while “transcript” denotes the RNA molecules transcribed from it, as per recent nomenclature guidelines. 16

A typical RNAseq pre-processing pipeline will consist of quality control (QC), read alignment & quantification before statistical analysis begins. QC assesses the quality of the raw fragmented reads using a standardised tool such as FastQC and trims low-quality reads or adaptor sequences. 17 Then for alignment, a reference genome/transcriptome arranges the subsequent sequences into feature bins such as genes, transcripts, exons and coding sequences using software such as STAR or HISAT. 18, 19 Alignment files (usually in the form of Sequence Alignment Maps: SAMs) can then be quantified to these features using a quantification tool such as HTSeq, Salmon or featureCounts usually normalising for library size and sequencing depth. 2022 Depending on the purpose of analysis, normalisation may be scaled by total number of reads (CPM: Counts per Million), per length of transcript (TPM: Transcripts per Million), by paired-end fragments (RPKM: Fragments Per Kilobase of Transcript) or by using a median of ratios (DESeq2’s method). 23 Commonly, a differential expression analysis will be performed at the gene or transcript level between groups of samples to identify statistically significant changes in expression. While gene-level analysis aggregates all transcripts aligned to a gene, transcript-level analysis enables the study of specific isoforms, which is particularly relevant for AS-focused pipelines.

The pre-processing steps for RNA-seq have been extensively researched over many years, and there is a consensus within the community regarding the gold-standard set of tools. Projects like nf-core enable the execution of RNA-seq pre-processing pipelines with minimal intervention and limited bioinformatics expertise. 24 However, these tend to be focused on the use-case of conventional differential expression rather than the more bespoke AS pipelines as discussed here.

A growing repertoire of tools now annotate and quantify changes to splicing events. Quantification of features such as splice sites, and exon/intron junctions found in alignment files are commonly used to annotate splicing events. Although the true repertoire of splicing events is difficult to capture, conventional processes can be categorised into distinct groups. The most common events are exon skipping, retained introns, mutually exclusive exons, alternative 5′ and 3′ splice sites. Additional regulatory events involve genomic features like alternative transcription start sites (TSS) and polyadenylation sites, which lead to variations in mRNA 5′ and 3′ untranslated regions (UTRs), themselves exons. These events are less frequently analysed in standard bioinformatics pipelines, not due to greater biological complexity, but because short-read sequencing with typical library preparation methods (e.g., random hexamer priming) often lacks sufficient coverage of transcript 5′ and 3′ ends. 25 Specialized library preparations, such as Cap Analysis of Gene Expression (CAGE) for 5′ end capture or 3′ end sequencing (e.g., QuantSeq) for polyadenylation analysis, are required for these studies, with tools like CAGER and DaPars (Dynamic Analysis of Alternative Polyadenylation from RNA-Seq) supporting such niche research. 26, 27 Visualisation of AS is predicated upon the level of detail required in the analysis. If a highly detailed analysis of individual gene structure is needed, splice graphs, sashimi plots and junction maps are commonly used. 28, 29 To visualize changes to groups of transcripts, typically MA and Volcano plots are used much the same way as in differential expression level analysis. 23

Current statistical methods for differential splicing

Commonly, researchers are interested in comparisons of two or more groups of samples known as differential analyses. Differential gene/transcript expression (DGE/DTE) of genes or transcripts involves taking raw read count data, normalizing or scaling it, and calculating whether the changes in expression levels between different biological groups are statistically significant. Differential transcript/exon usage (DTU/DEU), however, uses gene-level group modelling to assess whether the proportional use of the feature (exon or transcript) is statistically significant. Differential splicing events (DSE) on the other hand use a diverse array of statistical methods to quantify and infer splicing events. A comprehensive summary of differential splicing tools is described in the supplementary table ( Supplementary Table 1) and in the following sections.

Parametric & mixed methods

Differential expression analysis tools began in the early 2000s coinciding with the development of high throughput technologies such as microarrays. An early example was LIMMA (Linear Models for Microarray Data), developed by Gordon Smyth and colleagues in 2003, which utilises a linear regression framework and empirical Bayes techniques to identify differentially expressed features. 30 Whilst initially only utilised for microarrays, the functionality thus extended to RNASeq data and has been one of the most cited RNASeq methods. As the field shifted from microarray technology to RNASeq, methods were developed such as DESeq (Differential Expression Analysis for Sequence Count Data) and edgeR to capture the nature of count data better and improve modelling. 23, 30, 31 A major change incorporated in DESeq2 was empirical Bayes-based shrinkage to improve gene-wise variance estimation enhancing accuracy ( Figure 1). Secondly, GLMs (Generalized Linear Models) replaced the simple linear models as these were shown to adapt well to non-normally distributed count-based data. 23 The flexibility of GLMs allowed algorithms to effectively deal with issues such as overdispersion, shrinkage, heteroscedasticity and covariates. To date, GLMs are usually fitted to the NB (Negative Binomial) distribution which confers some strong advantages. The NB distribution effectively captures overdispersion (the empirical variability of counts) and can handle a large excess of zero values commonly seen in transcript or exon-level count data. However, limma, DESeq2 and edgeR were not developed to specifically address the challenges of identifying AS.

Figure 1. Timeline of statistical methods in differential splicing tool development.


Figure 1.

Methods are categorized into parametric and non-parametric approaches, grouped by methodological families. The classification is based on the underlying statistical procedures used for modelling or hypothesis testing, as detailed in Supplementary Table 1. Note that some methods incorporate elements of both parametric and non-parametric frameworks, resulting in overlapping features.

In 2014, DEXSeq was introduced by Michael Love and colleagues, a framework based on DESeq2’s GLM NB model becoming the de-facto tool for parametric splicing-based analysis. Instead of analysing gene-level differential expression, DEXSeq identifies exons within genes that exhibit significant changes in their usage across conditions. This is particularly useful for studying the exonic composition of alternatively spliced transcripts. The development of tools such as DSGseq, rDiff-parametric, JunctionSeq and SeqGSEA has expanded the functionality of the GLM NB family of differential splicing tools. 3235 DSGseq utilises a holistic approach considering splicing events not as individual elements but as comprehensive gene-wise splice graphs that more accurately reflect complex splicing dependencies. 32 The tool rDiff-parametric on the other hand utilises isoform-specific loci such as restricted exonic regions to identify significant differences in isoform composition. 33 By focusing on exonic regions unique to specific isoforms, rDiff-parametric avoids assigning ambiguous reads to overlapping isoforms. Assigning reads to isoforms is challenging because these transcripts are practically identical, making it difficult to definitively attribute a read from an overlapping region to a particular region without supplementary data. Therefore, full isoform deconvolution is significantly biased against genes with many isoform variants. 36

A few newer methods such as DRIMSeq and DTUrtle use non-parametric or mixed Dirichlet Multinomial Models (DMM) which have been argued to capture better the complex variability of count data and better estimate isoform abundance 37, 38 ( Figure 1 & Supplementary Table 1). Other methods such as IsoformSwitchAnalyzeR and some custom DEXSeq workflows now incorporate modularity allowing users a selection of bioinformatics tools for filtering, hypothesis testing and posterior calculations. 39, 40 An example of the usage of parametric analysis was in the discovery of a chimeric fusion transcript of PRKACA and DNAJB1 in a rare liver tumour FL-HCC (fibrolamellar hepatocellular carcinoma) using DEXSeq’s differential exon usage framework. 41 The discovery of differential exon usage of PRKACA’s exons 2-10 and subsequent decreased usage of DNAJB1’s exons 2-3 led the researchers to identify a chimeric transcript in FL-HCC patients. This demonstrated the utility of smaller exon-based analysis in identifying differences in transcript structure which would not be detected in larger gene or transcript-based analysis alone.

Probabilistic & non-parametric methods

Non-parametric or probabilistic techniques such as MAJIQ, SUPPA, WHIPPET and rMATS frequently utilize Bayesian inference and/or probabilistic methodologies. 29, 4244 By avoiding assumptions about the data’s underlying distribution, these methods enable more sophisticated modelling. Consequently, in contrast to the predominantly standardized parametric exon/transcript-based techniques, event-based methods often showcase a broader array of statistical approaches ( Figure 1). A few common features can be identified, however. Often the targets for event annotations are not labelled in gene-transfer format such as splice sites, exon/intron junctions and splicing quantitative trait loci (QTLs) which must be calculated. This then allows the “Percent spliced in” (PSI) to be calculated per exon, representing the ratio of the number of transcripts containing an alternative exon versus the total number of transcripts per any given splice site. By comparing PSI values, different splicing events can then be identified and explored through splice graphs and sashimi plots. An example of non-parametric tool usage was in the mapping of splicing events in the rice (Oryza sativa) transcriptome, revealing prevalent AS under deprived nutrient conditions. 45 Importantly, this study utilised rMATs to reveal the underlying exon-intron structure of key nutrient transporter genes.

Some tools possess features for specific utility in certain scenarios. NOISeq is a non-parametric differential expression tool that is specifically designed to handle smaller numbers of biological replicates through its noise model. 46 For more complex modelling, tools such as GLiMMPs (Generalized Linear Mixed Model for Pedigree Data with Population Substructure) employ mixed-effects models to account for both fixed and random effects such as genetic family substructure. 47 Beyond splicing, the modular tool IsoformSwitchAnalyzeR facilitates analysis on spliced transcript quality such as Nonsense Mediated Decay (NMD) sensitivity, Intrinsically Disordered Regions (IDR) and protein domains. 39 Increasingly, deep learning-based approaches are being utilised to improve the accuracy of differential splicing predictions leveraging publicly available RNASeq data such as with DARTs and Bisbee. 48, 49

Popularity & developer maintenance of methods

To assess the academic popularity of tools, a citation and developer engagement analysis of original research articles within the Web of Science (WoS) domain and the respective GitHub website domains (if applicable). The assessment spanned from 2010 to 2024 and encompassed 19 original papers on various differential splicing tools. Notably, the citation counts for these splicing tools were considerably lower compared to conventional RNA-Seq differential expression analysis tools. For instance, while the general purpose DGE/DTE tool DESeq2 amassed a total of 35,887 citations during the same period, citations for differential splicing tools ranged from 7 to 1300 ( Figure 2). This discrepancy may pose challenges for researchers seeking resources and workflows specific to differential splicing analysis. Additionally, the importance of developer support cannot be understated, as it directly influences the usability and longevity of software tools. Notably, differential splicing tools such as DEXSeq, EBSeq, rMATS, SUPPA2, and MAJIQ 29, 42, 43, 50, 51 have shown increasing usage and ongoing developer engagement, as evidenced by their growing citation counts and sustained support ( Figure 3; Figure 4). One possible explanation for the lower citation rates observed in exon/transcript-based methodologies could be the broader adoption of general-purpose differential expression workflows, like DESeq2 that can employ DTE. 23 Researchers may prefer more explicit splicing event-based tools for targeted splicing analyses and defer to DTE for transcript-based analyses. While the nuances between DTU and DTE may not be a primary focus for many researchers, it is a distinction worth noting in the context of differential splicing analysis.

Figure 2. Citation counts of differential splicing tools (2010–2024) from Web of Science (WoS) Data.


Figure 2.

Total citation counts for surveyed differential splicing tools (2010–2024) from the Web of Science Data Portal (WoS). Tools are categorized by analysis level: event, exon, or transcript. DRIMSeq’s original paper was excluded from the citation frequency analysis as it was not indexed in WoS. Certain data included herein are derived from Clarivate Web of Science. © Copyright Clarivate 2023. All rights reserved.Total citation counts for surveyed differential splicing tools (2010–2024) from the Web of Science Data Portal (WoS). Tools are categorized by analysis level: event, exon, or transcript. DRIMSeq’s original paper was excluded from the citation frequency analysis as it was not indexed in WoS. Certain data included herein are derived from Clarivate Web of Science. © Copyright Clarivate 2023. All rights reserved.

Figure 3. Citation trends of differential splicing tools (2010–2024) from Web of Science (WoS) Data.


Figure 3.

Annual citation frequency for current differential splicing tools (2010–2024) from Web of Science (WoS). Tools are categorized by analysis level: event, exon, or transcript. DRIMSeq’s original paper is excluded as it is not indexed in WoS. Certain data included herein are derived from Clarivate Web of Science. © Copyright Clarivate 2023. All rights reserved.

Figure 4. Developer maintenance of differential splicing tools.


Figure 4.

Annual GitHub repository commits (2010–2024) by category, highlighting community-led maintenance of differential splicing tools. Tools without GitHub pages (MAJIQ, MISO, DSGseq, and dSpliceType) were excluded from the analysis.

The decision between exon/transcript-level (typically parametric) and event-level (typically non-parametric) analyses hinges on several factors, including the particular scientific inquiry, data accessibility, and the level of granularity required to address the research goal. In certain scenarios, integrating both methodologies could offer a more holistic understanding of splicing control mechanisms and their biological significance.

Benchmarking of methods is difficult

To evaluate the quality of differential splicing bioinformatics tools, several benchmarks have been conducted to date. Benchmarking either the accuracy or the computational performance of methods can be challenging due to several factors. A primary obstacle is the lack of ground truth splicing quantifications. Often, benchmarks rely on small sets of experimentally validated splicing events as a reference. For instance, a 2019 systematic evaluation of 10 differential splicing tools tested 62 qPCR-validated differentially spliced genes across four human and mouse cancer datasets (breast, lung, prostate, and mouse lung). 52 This study found that rMATS and SUPPA2 exhibited higher sensitivity and precision in datasets with large library sizes, high sequencing depth, and low inter-replicate variability. While MAJIQ excelled at detecting complex splicing events (e.g., multiple exon skipping) but required greater memory and run time. Performance variability was attributed to RNA-seq data characteristics, including library size (total sequenced reads), sequencing depth (average coverage per nucleotide), positional bias (e.g., 3′ bias from poly-A selection), and the quality of reference annotations for the organism under study.

To mitigate these issues, some papers use simulated data to explore the impact of varying replicate numbers, sequencing depth, and inter-replicate variability within the data. 5355 For example, a benchmark using RSEM-based simulated data derived from a human prostate cancer dataset (GSE22260 56 ) found that workflows based on DESeq2 and Limma outperformed others in accuracy for high-depth data with multiple replicates, while NOISeq maintained robustness across variable library sizes. 54 Another comparison utilised a combination of experimental and simulated Arabidopsis heat shock RNASeq datasets using the Flux Simulator tool. 57 However, simulated data often fails to capture the complexity of biological data, including outliers and technical biases, limiting its generalizability.

The consensus drawn from these three benchmarks, indicate that tool performance varies significantly based on data quality and analysis goals. 52 , 54 For datasets with large library sizes, high sequencing depth, and low variability, DESeq2/DEXSeq, Limma, and rMATS excel in accuracy and speed, with Limma and NOISeq requiring lower memory and run times, making them ideal for large-scale analyses. Conversely, MAJIQ is better suited for complex splicing patterns, including those potentially involving TSS and APA, despite higher computational demands. Developer updates improve tool functionality, making benchmark results time-sensitive as newer versions may outperform older ones. Community-led maintenance efforts therefore, consistently enhance the functionality and reliability of tools over time. Rather than aiming for a singular optimal tool for differential splicing analysis, researchers should contemplate employing a suite of tools tailored to address specific inquiries.

Method recommendations

A diagram outlining optimal tool selection is provided to guide prospective AS researchers ( Figure 5). Researchers should first evaluate the scope and objectives of their analysis. For detecting global, transcriptome-wide changes in transcript usage, transcript-based tools like DEXSeq or DRIMSeq are recommended for differential transcript usage (DTU) analysis, as they model transcript-level counts across the entire transcriptome, leveraging high-quality annotations to identify global shifts in isoform expression. 40 However, when focusing on specific transcripts or splicing events, exon- and event-based tools like rMATS, SUPPA2, or MISO provide greater detail, accurately quantifying exon inclusion or specific splicing events (e.g., exon skipping, alternative splice sites) in datasets with large library sizes and high sequencing depth. Nonetheless, variations in experimental parameters such as sample size or covariate inclusion may necessitate alternative approaches.

Figure 5. Guideline for differential splicing tool selection based on experimental parameters.


Figure 5.

Decision tree for differential splicing analysis, categorized by three branches based on the level of analysis. Transcript-based methods are represented in blue, exon-based methods in pink, and event-based methods in yellow.

If the objective is to uncover novel transcripts, an exon-based parametric approach might be better suited. This choice circumvents the challenges associated with isoform deconvolution and the breadth of transcript annotation, given the smaller exonic regions. For general-purpose differential exon usage (DEU) analysis, DEXSeq is widely used due to its robust statistical framework and active maintenance within Bioconductor, while rMATS is highly cited for its accuracy and speed in exon usage quantification, making both complementary choices for detailed exon-level analysis. 23 , 43 However, again intricacies within the data may prompt the usage of more specialised alternatives. Transcript- and exon-based methods support top-down visualizations like MA/Volcano plots, heatmaps, and proportional transcript/exon graphs, suitable for summarizing global or exon-specific changes. If the analysis aims to visualise the movement of exons/introns and splice sites, then an event-based protocol would be more appropriate. Generally, tools such as rMATs, SUPPA2 and MISO offer comprehensive and detailed splicing event analysis. 42, 43, 58

Commonly, sashimi plots are the best method to visualise splice junctions from aligned data with events annotated, although this can also be plotted separately in IGV. 59 For user-friendly visualization, MAJIQ offers a summative HTML-based visualizer for complex events such as exitrons or orphan junctions. 29 Usability varies by tool interface: DEXSeq and DRIMSeq integrate with R environments, while rMATS, SUPPA2, and MAJIQ use command-line interfaces, runnable in IDEs like Visual Studio Code, with MAJIQ and SUPPA2 offering graphical outputs for broader accessibility. For organisms with poor annotation quality, annotation-free methods like LeafCutter are valuable alternatives. 60, 61 Optional steps, such as using Portcullis to filter false splice junctions, can enhance data quality by addressing misalignments common in short-read data. 62 For most analyses, DEU or DTU approaches (e.g., DEXSeq, rMATS) are recommended for their interpretability and robustness, with DTU tools preferred for transcriptome-wide insights and event-based tools for detailed, transcript-specific splicing analysis.

Discussion

While the repertoire of tools for differential splicing (DS) analysis has expanded over the past two decades, their effectiveness remains tied to RNASeq technology capabilities. Since 2010, long-read RNAseq, enabled by technologies like Oxford Nanopore Technologies (ONT) and PacBio’s single-molecule real-time (SMRT), has offered read lengths of 10kb to 100kb, with ultra-long reads reaching 1-2 Mb. 6365 This allows reconstruction of full-length transcript isoforms in a single read, bypassing deconvolution issues from multiple mapping and improving detection of known transcripts, novel splice variants, and fusion genes. However, long-read sequencing remains costly, often requiring hybridization with short-read RNAseq to achieve high accuracy (up to 99.5%). 66 Tools like StringTie2, exemplify this hybrid approach, combining short- and long-read data to enhance transcript assembly accuracy by leveraging the precision of short reads and the isoform resolution of long reads. 67 Consequently, short-read-based DS tools, such as IsoformSwitchAnalyzeR’s DEXSeq-based DTU workflow, remain highly relevant, as demonstrated by their successful application to ONT long-read data. 23, 39, 68

The high cost of long-read RNAseq underscores the continued importance of short-read DS tools, especially given the wealth of publicly available short-read RNAseq datasets in repositories like NCBI Gene Expression Omnibus (GEO), 69 EMBL-EBI ArrayExpress, 70 and Sequence Read Archive (SRA). 71 These thousands of datasets enable meta-analyses that yield novel biological insights without the expense of new sequencing. As interest in alternative splicing grows, advances in statistical methods and sequencing technologies are overcoming technical limitations, improving transcript alignment and simplifying computational workflows. Streamlined and modular workflows, such as those provided by Nextflow and the nf-core/rnasplice pipeline, empower researchers to create tailored AS pipelines with minimal setup effort, leveraging containerization to bypass installation challenges. 72 The synergy of cost-effective short-read tools, hybrid strategies, and extensive public datasets ensures a promising future for alternative splicing analysis, deepening our understanding of transcriptomic regulation and its functional significance.

Ethical approval and consent statement

Ethical approval and consent were not required.

Funding Statement

This work was supported by funding from Lonza Biologics Inc & The University of Sheffield. https://www.lonza.com/ I extend my gratitude for their financial assistance in facilitating this research project. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 2; peer review: 2 approved

Data availability statement

Underlying data

No data associated with this article.

Extended data

Zenodo: Selecting differential splicing methods: Practical considerations https://doi.org/10.5281/zenodo.14293573. 73

The repository contains the following underlying data:

  • Supplementary Table 1.docx: Statistical details on differential splicing tools.

  • citations_2023.csv: WoS citation count for differential splicing tools.

  • citations_year_plot_new.R: R script to visualise citation trends.

  • github_repos_txt: Github repository locations cloned on 20.02.2024.

  • github_repos.R: Github maintenance analysis and visualisation.

  • citations_2023.xlsx

Software availability statement

Archived software available from: 10.5281/zenodo.14293573

github_repos.R

citations_year_plot_new.R

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

References

  • 1. Baralle FE, Giudice J: Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 2017 Jul;18(7):437–451. 10.1038/nrm.2017.27 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Matera AG, Wang Z: A day in the life of the spliceosome. Nat. Rev. Mol. Cell Biol. 2014 Feb;15(2):108–121. 10.1038/nrm3742 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Singh B, Eyras E: The role of alternative splicing in cancer. Transcription. 2017 Mar 15;8(2):91–98. 10.1080/21541264.2016.1268245 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Bonnal SC, López-Oreja I, Valcárcel J: Roles and mechanisms of alternative splicing in cancer — implications for care. Nat. Rev. Clin. Oncol. 2020 Aug;17(8):457–474. 10.1038/s41571-020-0350-x [DOI] [PubMed] [Google Scholar]
  • 5. Zhang Y, Qian J, Gu C, et al. : Alternative splicing and cancer: a systematic review. Signal Transduct. Target. Ther. 2021 Feb 24;6(1):1–14. 10.1038/s41392-021-00486-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Kar A, Kuo D, He R, et al. : Tau Alternative Splicing and Frontotemporal Dementia. Alzheimer Dis. Assoc. Disord. 2005;19(Suppl 1):S29–S36. 10.1097/01.wad.0000183082.76820.81 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Yanagisawa M, Huveldt D, Kreinest P, et al. : A p120 Catenin Isoform Switch Affects Rho Activity, Induces Tumor Cell Invasion, and Predicts Metastatic Disease. J. Biol. Chem. 2008 Jun 27;283(26):18344–18354. 10.1074/jbc.M801192200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. McEvoy J, Ulyanov A, Brennan R, et al. : Analysis of MDM2 and MDM4 Single Nucleotide Polymorphisms, mRNA Splicing and Protein Expression in Retinoblastoma. PLoS One. 2012 Aug 20;7(8):e42739. 10.1371/journal.pone.0042739 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Ferrucci L, Wilson DM, Donegà S, et al. : The energy-splicing resilience axis hypothesis of aging. Nat. Aging. 2022;2(3):182–185. 10.1038/s43587-022-00189-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Cain K, Peters S, Hailu H, et al. : A CHO cell line engineered to express XBP1 and ERO1-Lα has increased levels of transient protein expression. Biotechnol. Prog. 2013 Jun;29(3):697–706. 10.1002/btpr.1693 [DOI] [PubMed] [Google Scholar]
  • 11. Johari YB, Estes SD, Alves CS, et al. : Integrated cell and process engineering for improved transient production of a “difficult-to-express” fusion protein by CHO cells. Biotechnol. Bioeng. 2015;112(12):2527–2542. 10.1002/bit.25687 [DOI] [PubMed] [Google Scholar]
  • 12. Torres M, Dickson AJ: Reprogramming of Chinese hamster ovary cells towards enhanced protein secretion. Metab. Eng. 2022 Jan;69(69):249–261. 10.1016/j.ymben.2021.12.004 [DOI] [PubMed] [Google Scholar]
  • 13. Butt H, Eid A, Momin AA, et al. : CRISPR directed evolution of the spliceosome for resistance to splicing inhibitors. Genome Biol. 2019 Apr 30;20(1):73. 10.1186/s13059-019-1680-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009 Jan;10(1):57–63. 10.1038/nrg2484 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Stark R, Grzelak M, Hadfield J: RNA sequencing: the teenage years. Nat. Rev. Genet. 2019 Nov;20(11):631–656. 10.1038/s41576-019-0150-2 [DOI] [PubMed] [Google Scholar]
  • 16. Cunningham ASG, Gorospe M: Striving for clarity in language about gene expression. Nucleic Acids Res. 2024;52(18):10747–10753. 10.1093/nar/gkae764 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data. [cited 2022 Aug 7]. Reference Source
  • 18. Dobin A, Davis CA, Schlesinger F, et al. : STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15–21. 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Kim D, Langmead B, Salzberg SL: HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 2015 Apr;12(4):357–360. 10.1038/nmeth.3317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Patro R, Duggal G, Love MI, et al. : Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods. 2017 Apr;14(4):417–419. 10.1038/nmeth.4197 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Putri GH, Anders S, Pyl PT, et al. : Analysing high-throughput sequencing data in Python with HTSeq 2.0. Bioinformatics. 2022 May 13;38(10):2943–2945. 10.1093/bioinformatics/btac166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Liao Y, Smyth GK, Shi W: featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014 Apr 1;30(7):923–930. 10.1093/bioinformatics/btt656 [DOI] [PubMed] [Google Scholar]
  • 23. Love MI, Huber W, Anders S: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014 Dec 5;15(12):550. 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Ewels PA, Peltzer A, Fillinger S, et al. : nf-core: Community curated bioinformatics pipelines. bioRxiv. 2019 [cited 2024 Mar 15]; p.610741. 10.1101/610741v1 [DOI] [PubMed]
  • 25. Shah A, Mittleman BE, Gilad Y, et al. : Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation. Genome Biol. 2021 Oct 14;22(1):291. 10.1186/s13059-021-02502-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Kawaji H, Lizio M, Itoh M, et al. : Comparison of CAGE and RNA-seq transcriptome profiling using clonally amplified and single-molecule next-generation sequencing. Genome Res. 2014 Apr;24(4):708–717. 10.1101/gr.156232.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Xia Z, Donehower LA, Cooper TA, et al. : Dynamic analyses of alternative polyadenylation from RNA-seq reveal a 3′-UTR landscape across seven tumour types. Nat. Commun. 2014 Nov 20;5(1):5274. 10.1038/ncomms6274 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Katz Y, Wang ET, Silterra J, et al. : Quantitative visualization of alternative exon expression from RNA-seq data. Bioinformatics. 2015 Jul 1;31(14):2400–2402. 10.1093/bioinformatics/btv034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Vaquero-Garcia J, Barrera A, Gazzara MR, et al. : A new view of transcriptome complexity and regulation through the lens of local splicing variations. Valcárcel J, editor. elife. 2016 Feb 1;5(5):e11752. 10.7554/eLife.11752 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Ritchie ME, Phipson B, Wu D, et al. : Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015 Apr 20;43(7):e47. 10.1093/nar/gkv007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010 Jan 1;26(1):139–140. 10.1093/bioinformatics/btp616 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Wang W, Qin Z, Feng Z, et al. : Identifying differentially spliced genes from two groups of RNA-seq samples. Gene. 2013 Apr 10;518(1):164–170. 10.1016/j.gene.2012.11.045 [DOI] [PubMed] [Google Scholar]
  • 33. Drewe P, Stegle O, Hartmann L, et al. : Accurate detection of differential RNA processing. Nucleic Acids Res. 2013 May 1;41(10):5189–5198. 10.1093/nar/gkt211 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Hartley SW, Mullikin JC: Detection and visualization of differential splicing in RNA-Seq data with JunctionSeq. Nucleic Acids Res. 2016 Sep 6;44(15):e127. 10.1093/nar/gkw501 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Wang X, Cairns MJ: SeqGSEA: a Bioconductor package for gene set enrichment analysis of RNA-Seq data integrating differential expression and splicing. Bioinforma Oxf. Engl. 2014 Jun 15;30(12):1777–1779. 10.1093/bioinformatics/btu090 [DOI] [PubMed] [Google Scholar]
  • 36. Hiller D, Jiang H, Xu W, et al. : Identifiability of isoform deconvolution from junction arrays and RNA-Seq. Bioinformatics. 2009 Dec 1;25(23):3056–3059. 10.1093/bioinformatics/btp544 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Nowicka M, Robinson MD: DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Research. 2016 Dec 6;5:1356. 10.12688/f1000research.8900.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Tekath T, Dugas M: Differential transcript usage analysis of bulk and single-cell RNA-seq data with DTUrtle. Bioinformatics. 2021 Nov 1;37(21):3781–3787. 10.1093/bioinformatics/btab629 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Vitting-Seerup K, Sandelin A: IsoformSwitchAnalyzeR: analysis of changes in genome-wide patterns of alternative splicing and its functional consequences. Bioinformatics. 2019 Nov 1;35(21):4469–4471. 10.1093/bioinformatics/btz247 [DOI] [PubMed] [Google Scholar]
  • 40. Love MI, Soneson C, Patro R: Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Res. 2018 [cited 2023 Feb 16]. 10.12688/f1000research.15398.1 Reference Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Honeyman JN, Simon EP, Robine N, et al. : Detection of a Recurrent DNAJB1-PRKACA Chimeric Transcript in Fibrolamellar Hepatocellular Carcinoma. Science. 2014 Feb 28;343(6174):1010–1014. 10.1126/science.1249484 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Trincado JL, Entizne JC, Hysenaj G, et al. : SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 2018 Mar 23;19(1):40. 10.1186/s13059-018-1417-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Shen S, Park JW, Lu Z, et al. : rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl. Acad. Sci. 2014 Dec 23;111(51):E5593–E5601. 10.1073/pnas.1419161111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Sterne-Weiler T, Weatheritt RJ, Best AJ, et al. : Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop. Mol. Cell. 2018 Oct 4;72(1):187–200.e6. 10.1016/j.molcel.2018.08.018 [DOI] [PubMed] [Google Scholar]
  • 45. Dong C, He F, Berkowitz O, et al. : Alternative Splicing Plays a Critical Role in Maintaining Mineral Nutrient Homeostasis in Rice (Oryza sativa). Plant Cell. 2018 Oct 1;30(10):2267–2285. 10.1105/tpc.18.00051 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Tarazona S, Furió-Tarí P, Turrà D, et al. : Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Res. 2015 Dec 2;43(21):e140. 10.1093/nar/gkv711 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Zhao K, Lu Z, Xiang, et al. : GLiMMPS: robust statistical model for regulatory variation of alternative splicing using RNA-seq data. Genome Biol. 2013 Jul 22;14(7):R74. 10.1186/gb-2013-14-7-r74 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Halperin RF, Hegde A, Lang JD, et al. : Improved methods for RNAseq-based alternative splicing analysis. Sci. Rep. 2021 May 24;11(1):10740. 10.1038/s41598-021-89938-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Zhang Z, Pan Z, Ying Y, et al. : Deep-learning augmented RNA-seq analysis of transcript splicing. Nat. Methods. 2019 Apr;16(4):307–310. 10.1038/s41592-019-0351-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Anders S, Reyes A, Huber W: Detecting differential usage of exons from RNA-seq data. Genome Res. 2012 Oct 1;22(10):2008–2017. 10.1101/gr.133744.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Leng N, Dawson JA, Thomson JA, et al. : EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics. 2013 Apr 15;29(8):1035–1043. 10.1093/bioinformatics/btt087 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Mehmood A, Laiho A, Venäläinen MS, et al. : Systematic evaluation of differential splicing tools for RNA-seq studies. Brief. Bioinform. 2020 Dec 1;21(6):2052–2065. 10.1093/bib/bbz126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Liu R, Loraine AE, Dickerson JA: Comparisons of computational methods for differential alternative splicing detection using RNA-seq in plant systems. BMC Bioinformatics. 2014 Dec 16;15(1):364. 10.1186/s12859-014-0364-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Merino GA, Conesa A, Fernández EA, et al. : A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies. Brief. Bioinform. 2019 Mar 25;20(2):471–481. 10.1093/bib/bbx122 [DOI] [PubMed] [Google Scholar]
  • 55. Jiang M, Zhang S, Yin H, et al. : A comprehensive benchmarking of differential splicing tools for RNA-seq analysis at the event level. Brief. Bioinform. 2023 May 1;24(3):bbad121. 10.1093/bib/bbad121 [DOI] [PubMed] [Google Scholar]
  • 56. Kannan K, Wang L, Wang J, et al. : Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing. Proc. Natl. Acad. Sci. 2011 May 31;108(22):9172–9177. 10.1073/pnas.1100489108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Griebel T, Zacher B, Ribeca P, et al. : Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res. 2012 Nov 1;40(20):10073–10083. 10.1093/nar/gks666 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Katz Y, Wang ET, Airoldi EM, et al. : Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods. 2010 Dec;7(12):1009–1015. 10.1038/nmeth.1528 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Robinson JT, Thorvaldsdóttir H, Winckler W, et al. : Integrative genomics viewer. Nat. Biotechnol. 2011 Jan;29(1):24–26. 10.1038/nbt.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Li YI, Knowles DA, Humphrey J, et al. : Annotation-free quantification of RNA splicing using LeafCutter. Nat. Genet. 2018 Jan;50(1):151–158. 10.1038/s41588-017-0004-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Benegas G, Fischer J, Song YS: Robust and annotation-free analysis of alternative splicing across diverse cell types in mice. Eyras E, Manley JL, editors. elife. 2022 Mar 1;11:e73520. 10.7554/eLife.73520 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Mapleson D, Venturini L, Kaithakottil G, et al. : Efficient and accurate detection of splice junctions from RNA-seq with Portcullis. GigaScience. 2018 Dec 1;7(12):giy131. 10.1093/gigascience/giy131 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Eid J, Fehr A, Gray J, et al. : Real-Time DNA Sequencing from Single Polymerase Molecules. Science. 2009 Jan 2;323(5910):133–138. 10.1126/science.1162986 [DOI] [PubMed] [Google Scholar]
  • 64. Derrington IM, Butler TZ, Collins MD, et al. : Nanopore DNA sequencing with MspA. Proc. Natl. Acad. Sci. 2010 Sep 14;107(37):16060–16065. 10.1073/pnas.1001831107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Feng Y, Zhang Y, Ying C, et al. : Nanopore-based Fourth-generation DNA Sequencing Technology. Genomics Proteomics Bioinformatics. 2015 Feb 1;13(1):4–16. 10.1016/j.gpb.2015.01.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Amarasinghe SL, Su S, Dong X, et al. : Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 2020 Feb 7;21(1):30. 10.1186/s13059-020-1935-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Kovaka S, Zimin AV, Pertea GM, et al. : Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019 Dec 16;20(1):278. 10.1186/s13059-019-1910-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Wright DJ, Hall NAL, Irish N, et al. : Long read sequencing reveals novel isoforms and insights into splicing regulation during cell state changes. BMC Genomics. 2022 Jan 10;23(1):42. 10.1186/s12864-021-08261-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Clough E, Barrett T: The Gene Expression Omnibus database. Methods Mol. Biol. 2016;1418:93–110. 10.1007/978-1-4939-3578-9_5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Parkinson H, Kapushesky M, Shojatalab M, et al. : ArrayExpress—a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 2007 Jan;35(Database issue):D747–D750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Leinonen R, Sugawara H, Shumway M: The Sequence Read Archive. Nucleic Acids Res. 2011 Jan;39(Database issue):D19–D21. 10.1093/nar/gkq1019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Di Tommaso P, Chatzou M, Floden EW, et al. : Nextflow enables reproducible computational workflows. Nat. Biotechnol. 2017 Apr;35(4):316–319. 10.1038/nbt.3820 [DOI] [PubMed] [Google Scholar]
  • 73. Draper BJ: Selecting differential splicing methods: Practical Considerations - R Scripts and Data. Zenodo. 2024. 10.5281/zenodo.14293573 [DOI] [PMC free article] [PubMed]
F1000Res. 2025 Aug 28. doi: 10.5256/f1000research.182560.r389377

Reviewer response for version 2

Juan Valcarcel 2, Monica Salinas 1

This review by Draper et al. provides a comprehensive and well-curated overview of the available tools for RNA-Seq-based alternative splicing (AS) analysis. While differential gene expression remains the most common application of RNA-Seq data, the authors successfully bring attention to the expanding repertoire of computational methods dedicated to AS. The manuscript is clearly written, well-structured, and the figures are informative; in particular the decision-tree diagram in Figure 5, which serves as a helpful visual guide for tool selection.

Requested revisions:

1) It is not entirely clear whether the distinction between exon-based and event-based approaches refers solely to whether different types of splicing events (e.g., retained introns, alternative splice sites, etc.)or only exons are considered; or if there is an additional level of distinction. A more explicit definition of this stratification would help improve clarity and guide readers in understanding the differences between both strategies.

2)Vast-tools, a widely used tool in the AS community, is notably absent from the current manuscript. The main publication describing Vast-tools has garnered >400 citations and the tool is actively maintained with comprehensive documentation available on GitHub. It meets the inclusion criteria set by the authors (ie citation impact and repository maintenance). Vast-tools enables PSI-based quantification of >700000 annotated splicing events and is part of a broader analytical framework that includes matt (for downstream analysis of AS event features) and vastDB (a comprehensive database of AS profiles). Including this tool in the review would provide a more complete representation of the field and would be useful for readers.

Associated references:

 

 

 

 

 

3) Sequencing depth is a critical parameter that is often underappreciated in splicing analyses. The standard 30M reads per sample, typically sufficient for gene expression studies, are generally inadequate for reliable quantification of splicing events—particularly for low-abundance isoforms. This limitation should be emphasized more clearly in the manuscript, especially in the context of data generation and when selecting publicly available datasets. Many public RNA-Seq datasets were not originally designed for AS analysis, which can severely limit downstream interpretations when re-analysed.

4) About Tools Comparison:

4.1) While the statistical frameworks underlying each tool are briefly mentioned, a more detailed explanation of their analytical approaches (beyond simply naming the statistical test used) would enhance the review’s educational value, especially for non-expert readers.

4.2) The manuscript could benefit from a more explicit discussion on whether each tool relies on predefined annotations or performs de novo detection of splice junctions. This distinction is crucial depending on the biological question (e.g., whether the aim is to identify novel or neojunctions, or to compare known events across conditions).

4.3) Similarly, an overview comparing the output metrics used by the different tools would be highly informative and would help readers determine which software best suits their analysis goals. In those cases that do not rely on PSI values, which alternative metrics are used?

5) On page 5, when comparing expression normalization metrics such as CPM, TPM, and RPKM, a brief explanation of the contexts in which each metric is most appropriate would be helpful.

6)In the discussion on long-read sequencing technologies, it would be useful to indicate which of the reviewed tools are compatible with long-read RNA-Seq data, and which are restricted to short-read inputs. This information could, for example, be conveyed as an additional annotation in one of the existing figures (e.g., Figure 5), rather than in the main text.

Is the review written in accessible language?

Yes

Are all factual statements correct and adequately supported by citations?

Yes

Are the conclusions drawn appropriate in the context of the current research literature?

Yes

Is the topic of the review discussed comprehensively in the context of the current literature?

Yes

Reviewer Expertise:

Alternative splicing, RNA biology

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

References

  • 1. : An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Research .2017;27(10) : 10.1101/gr.220962.117 1759-1768 10.1101/gr.220962.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. : Matt : Unix tools for alternative splicing analysis. Bioinformatics .2019;35(1) : 10.1093/bioinformatics/bty606 130-132 10.1093/bioinformatics/bty606 [DOI] [PubMed] [Google Scholar]
F1000Res. 2025 Aug 13. doi: 10.5256/f1000research.182560.r397528

Reviewer response for version 2

Zhaoqi Liu 1

Summary:

This manuscript provides a comprehensive review of differential alternative splicing analysis methods primarily based on short-read RNA sequencing data. The authors cover a wide range of computational tools and statistical frameworks, including both parametric and non-parametric approaches, and discuss their applications, strengths, and limitations. Overall, the article offers a valuable resource for researchers aiming to select appropriate tools for AS analysis and to understand the landscape of current methodologies.

Minor Comments:

  1. User Experience and Usability: It would strengthen the review to include a systematic comparison of installation complexity, dependency requirements, and user-friendliness across the reviewed tools. Furthermore, a discussion on each tool’s visualization capabilities—such as graphical user interfaces, scripting options—would greatly benefit non-expert users and enhance the practical guidance of the review.

  2. Computational Resource Requirements: Given that large-scale RNA-Seq datasets are increasingly common, the authors are encouraged to comment on the computational efficiency of the tools, including CPU and memory demands, support for multi-threading or distributed computing. This would provide readers with a better understanding of which tools are most suitable for extensive datasets.

  3. Future Directions: Besides short-read and long-read sequencing integration, more elaborate coverage of emerging trends would be valuable. Specifically, a deeper analysis of deep learning techniques for splicing event detection and interpretation should be considered. Additionally, current challenges and prospects in single-cell splicing analysis could be addressed in more detail, especially how short-read tools are applied in single-cell contexts by aggregating data into pseudo-bulk samples.

Conclusion:

This manuscript is well-organized and informative. Addressing the above points would further enhance its clarity, practical utility, and relevance to current and future research trends in alternative splicing analysis.

Is the review written in accessible language?

Yes

Are all factual statements correct and adequately supported by citations?

Yes

Are the conclusions drawn appropriate in the context of the current research literature?

Yes

Is the topic of the review discussed comprehensively in the context of the current literature?

Yes

Reviewer Expertise:

Computational cancer genomics with a focus on RNA splicing dysregulation in human cancer

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2025 Jul 29. doi: 10.5256/f1000research.182560.r397523

Reviewer response for version 2

Julia Olivieri 1

This paper provides a review of alternative splicing (AS) analysis software for short-read RNA sequencing data. Tool popularity and active maintenance are compared using metrics such as citation count and number of GitHub commits. The flowchart provided in Figure 5 is especially helpful for researchers determining which approach is best for their use case. This review is written in accessible language, and breaks the different approaches down into helpful categories, such as parametric vs nonparametric and differential transcript expression (DTE) vs differential exon usage (DEU) vs differential splicing events (DSE). 

This paper will be a useful resource as alternative splicing analysis on short-read RNA sequencing data becomes more prevalent. However, I have several concerns with the paper that I recommend addressing.

Concerns that should be addressed to make the article scientifically sound:

  • The abstract states that 22 tools are analyzed. However, the tools are not fully consistent throughout. DESeq2 is mentioned in the supplemental table, although it is not specifically optimized for AS analysis, and similar tools such as LIMMA and edgeR are not included in the table. Also, WHIPPET is included in the table and mentioned in the text but not included in any of the figures, and Cuffdiff2 is mentioned in the table and Figure 1, but not elsewhere. Keeping the tool list consistent would aid the clarity of the paper.

  • DRIMSeq and DTUrtle are categorized as parametric models, but the text says that they use non-parametric models. This should be clarified.

  • Bisbee is stated to be a deep-learning method, but this is not supported by the citation.

Additional recommendations:

  • In Figure 5, the “isoform-based analysis” diamond breaks the flow of the chart: one cannot trace a path from the top left to this diamond. I recommend re-positioning it to fit into the flow chart more naturally. Also, I recommend adding arrows to SplicingCompass and dSpliceType in the flowchart based on when they should be used (unless they would never be recommended).

  • Understanding the difference between DGE/DTE, DTU/DEU, and DSE is critical to fully understanding the paper. A figure to go along with the in-text description would be very helpful to clarify this for the reader.

  • Visualization tools are mentioned several times in the manuscript. It would be helpful to include example plots, at least an example sashimi plot, because many readers are likely unfamiliar with these plot types.

  • The text states about nonparametric models that “By avoiding assumptions about the data’s underlying distribution, these methods enable more sophisticated modeling.” Rather than “enable,” I would say “require.” 

  • It is stated that the lower citation counts of software for AS analysis compared to DGE analysis “may pose challenges for researchers seeking resources and workflows specific to differential splicing analysis.” I am not sure why fewer citations would pose challenges - I would suggest clarifying or changing the wording (I would understand if fewer options posed challenges, or if lower usage meant less developer support).

  • There are several typos, words missing, and grammatical errors that could be fixed with another round of proofreading. For example, in the supplemental table “riff-parametric” should be “rdiff-parametric” and “To assess the academic popularity of tools, a citation and developer engagement analysis of original research articles within the Web of Science (WoS) domain and the respective GitHub website domains (if applicable)” is a sentence fragment.

  • The section about long-read sequencing in the discussion feels somewhat out of place based on the short-read focus of the rest of the paper. I recommend shortening this section.

Is the review written in accessible language?

Yes

Are all factual statements correct and adequately supported by citations?

Partly

Are the conclusions drawn appropriate in the context of the current research literature?

Yes

Is the topic of the review discussed comprehensively in the context of the current literature?

Yes

Reviewer Expertise:

Alternative splicing analysis, bioinformatics, biostatistics, single cell RNA sequencing

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

F1000Res. 2025 Jun 27. doi: 10.5256/f1000research.182560.r388756

Reviewer response for version 2

Charlotte Capitanchik 1

I find the updated article much improved.

Is the review written in accessible language?

Partly

Are all factual statements correct and adequately supported by citations?

Partly

Are the conclusions drawn appropriate in the context of the current research literature?

Partly

Is the topic of the review discussed comprehensively in the context of the current literature?

Yes

Reviewer Expertise:

bioinformatics, splicing, RNA biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2025 Apr 25. doi: 10.5256/f1000research.170367.r375800

Reviewer response for version 1

Charlotte Capitanchik 1

The article by Draper and colleagues presents a well-researched and well-organised overview of differential splicing analysis tools for short-read RNA sequencing, which will be valuable for many researchers. The practical focus is excellent, I especially like the section on developer maintenance of tooling, which is often neglected in software review articles. The decision tree in Figure 5 is also a great framework for navigating this crowded space and I’m sure will be useful to many. 

Despite this, in many places the use of language is imprecise: words are missing or used incorrectly, sentences are too vague—perhaps from being over-simplified in an attempt to make the text more understandable. I also find some of the recommendations are not adequately supported by evidence. Please find some comments that I hope will be helpful, below: 

Major comments

  • The method benchmarking section needs some attention. ‘Scientific accuracy’ should be replaced by just ‘accuracy’. ‘Computational power’ should be specified, presumably you mean performance characteristics such as memory usage and run time. “Lack of ground truth to set as a reference to compare to..” is unnecessary, just say a lack of ground truth splicing quantifications. Library size and sequencing depth are the same thing? Library size and positional bias are qualities of the RNA-seq data itself, not the bioinformatic pre-processing steps. Annotation quality is not a ‘pre-processing’ step - this is a feature of the organism under study. “Achieve finer control over the ground truth” could be rephrased to something more meaningful like ‘to explore the impact of increasing variability between replicates, changing replicate numbers, sequencing depth’…..etc. “The consensus drawn from these three benchmarks is that the performance of differential splicing tools exhibits considerable variability depending on the outlined factors.” This seems weak, perhaps a more nuanced conclusion can be drawn - when data is very good, deep sequencing, low variability, which tool performs best? Which tools have the lowest run times, compute requirements .etc. I don’t think it’s sufficient to simply say its complicated - please dive more into the details here. “The ongoing evolution and upkeep of tools by developers introduce a time-dependent aspect to benchmarking. Community-led maintenance efforts consistently enhance the functionality and reliability of tools over time.” I’m not sure what this means, please clarify. 

  • The discussion brings in developments in long read sequencing and is quite nice - I would suggest making this a section of its own and expanding on what is already written. Alternatively, I would consider cutting it back a bit and changing the title of the article to reflect a focus on short-read sequencing data. Perhaps one point for the discussion is that whilst new methods are developing, there remains hundreds of thousands of publicly available short read RNA sequencing datasets through which novel biological insights can still be made.

  • In the section of recommendations - “For instance, if the aim is to identify known transcripts, it is advisable to opt for a parametric transcript-based tool like DEXSeq or DRIMSeq and execute a DTU study following Michael Love’s protocol.37” The citation does not support the statement - why shouldn’t researchers opt for an exon-based approach when the transcriptomic annotations are good? Also in this section, DEXSeq is recommended for DEU analysis, when rMATs is the most highly cited splicing tool and provides accurate quantifications of exon usage. “Overall, event-based methods are more suited to advanced programmers owing to their use of command-line tools over interpreters that use IDEs (Integrated development environments). For most analyses, however, a DEU or DTU-based analysis is recommended for simple interpretability and robustness.” I don’t understand, how is DEX-Seq easier to use than rMATs (for example), or MAJIQ which has extensive graphical reporting? Are you saying this because DEX-Seq is an R package so you can use RStudio?  - this doesn’t seem like a particularly helpful argument - I can run rMATs or MAJIQ or any of these using a bash script in Visual Studio Code which is also an IDE…

Minor comments

  • Alternative splicing (AS) abbreviation is given several times throughout text and sometimes used, sometimes not - please be consistent.

  • “The proposed advantage of this approach is in the smaller exonic regions rather than full isoform deconvolution.” Rephrase for clarity, presumably you mean by focusing on regions unique to distinct isoforms the tool avoids the issue of assigning ambiguous reads to isoforms.

  • "A few newer methods such as DRIMSeq and DTUrtle have progressed onto non-parametric or mixed Dirichlet Multinomial Models (DMM)" ‘progression’ suggests there is some kind of hierarchy, you can just say that these models ‘use’ other distributions 

  • “More complex regulatory events involve genomic features beyond exons and introns, such as alternative promoter and polyadenylation sites, which result in varying mRNA 5′ and 3′ UTR ends. However, these events are seldom included in most bioinformatics analyses, tools such as CAGER (Cap Analysis of Gene Expression) and DaPars (Dynamic Analysis of Alternative PolyAdenylation from RNA-Seq) are available for niche research” I wouldn’t say these events are more complex from a biological standpoint. The issue in analysis of alternative TSS use and APA is that short read sequencing with typical library preparation methods (e.g. random hexamer priming) won’t have good coverage of exact transcript 5’ and 3’ ends. Therefore typically library preparations with mRNA cap capture (CAGE) or 3’ end sequencing (eg. Quantseq) are used when this is the analysis goal. Also, to be a pedant, 3’UTRs and 5’UTRs are exons.

  • The discussion mentions Nextflow, and nf-core is mentioned earlier, but it might be nice to specifically mention the efforts of nf-core/rnasplice. As you know, one of the benefits of these pipelines is that everything is containerised so you don’t have to mess about installing everything. Generally speaking, one of the barriers to using tools can be installation - of the presented tools there is quite a range of levels of developer investment in making the tools easy to install. Some are in bioconda or bioconductor and have containers, others you have to contact the authors to get permission to download (MAJIQ!) - this might be related to the amount of citations that tools get. It would be nice (but not necessary) to address this too.

Is the review written in accessible language?

Partly

Are all factual statements correct and adequately supported by citations?

Partly

Are the conclusions drawn appropriate in the context of the current research literature?

Partly

Is the topic of the review discussed comprehensively in the context of the current literature?

Yes

Reviewer Expertise:

bioinformatics, splicing, RNA biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

F1000Res. 2025 May 21.
Ben Draper 1

Dear Dr. Capitanchik,

Thank you for your thorough and insightful review of our manuscript. Your comments on the benchmarking, recommendations, long-read discussion, and minor edits have significantly improved the manuscript’s clarity and accuracy. We have incorporated minimal changes to address your concerns, ensuring the revisions align with your suggestions while maintaining the manuscript’s focus.

For the benchmarking section, we corrected terminology (“accuracy” instead of “scientific accuracy,” “computational performance” instead of “computational power”), clarified that library size, positional bias, and annotation quality are dataset characteristics, not pre-processing steps, and rephrased vague terms (e.g., “lack of ground truth” to “lack of comprehensive ground truth splicing quantifications”). We added specific examples drawn from the benchmarks (e.g., DEXSeq, rMATS, NOISEQ) to strengthen the conclusions.

In the recommendations section, we clarified our positions on programming environments, developer and community support, making sure not to discriminate harshly against command line-based tools. We believe this is still worth mentioning, however, as in our experience, the programming platform matters for accessibility.

In the discussion, we opted for shortening the long-read section for brevity and making the article more focused towards short-read. We have added a few minor points in agreement with Dr. Donega. We agree with all the minor edits (e.g., consistent AS acronym usage, clarified rDiff-parametric, revised TSS/APA) that were made as requested.

We believe these changes should address your concerns effectively.

Best wishes

Ben J. Draper

University of Sheffield

F1000Res. 2025 Apr 22. doi: 10.5256/f1000research.170367.r375793

Reviewer response for version 1

Stefano Donega 1

In this review, Dr. Draper, Dunning, and James provide a very good comprehensive and well-written summary of the tools applied to the investigation of alternative splicing, with a thorough literature perspective and a detailed guide to choosing the most appropriate statistical methods and software. While I really appreciated and enjoyed reading the manuscript, I would like to provide a few comments that I believe would enhance and elevate the quality of the work:

  • In general, the entire manuscript discusses methods that directly apply to short-read platforms. Therefore, I think this should be better highlighted both in the manuscript title and throughout the whole review.

  • The long-read platforms appear only in the discussion section. I recommend the authors dedicate a separate paragraph to them, independent of the discussion, while keeping the discussion to connect together the main findings investigated in the main text.

Now, I will provide some minor comments:

  • In a recent Nature Aging paper, Ferrucci et al. 2022 (Ref 1) discussed the "energy-splicing axis hypothesis on aging," which is worthy of mentioning in the introductory paragraph on the importance of splicing.

  • There have been efforts to clarify modern nomenclature in gene expression studies, and guidelines were recently proposed to increase precision and clarity when communicating about gene expression, most notably to reserve 'gene' for the DNA template and 'transcript' for the RNA transcribed from that gene (Cunningham ASG, et al., 2024 [Ref 2]). I suggest authors consider aligning some definitions found in the manuscript with these guidelines.

  • There is no mention of the possibility of combining short- and long-read sequencing to enhance quantity and quality of results. I strongly suggest the authors include in their review a section on "StringTie" which utilizes both short and long RNA-seq reads for transcript assembly to generate a hybrid strategy (Shumate A, et al., 2022 [Ref 3]).

After these improvements, I am confident this article will be highly cited in the field.

Is the review written in accessible language?

Yes

Are all factual statements correct and adequately supported by citations?

Yes

Are the conclusions drawn appropriate in the context of the current research literature?

Partly

Is the topic of the review discussed comprehensively in the context of the current literature?

Yes

Reviewer Expertise:

Aging, muscle, mitochondria, energy, hypoxia, exercise

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

References

  • 1. : The energy-splicing resilience axis hypothesis of aging. Nat Aging .2022;2(3) : 10.1038/s43587-022-00189-w 182-185 10.1038/s43587-022-00189-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. : Striving for clarity in language about gene expression. Nucleic Acids Res .2024;52(18) : 10.1093/nar/gkae764 10747-10753 10.1093/nar/gkae764 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. : Improved transcriptome assembly using a hybrid of long and short reads with StringTie. PLoS Comput Biol .2022;18(6) : 10.1371/journal.pcbi.1009730 e1009730 10.1371/journal.pcbi.1009730 [DOI] [PMC free article] [PubMed] [Google Scholar]
F1000Res. 2025 May 21.
Ben Draper 1

Dear Dr. Donega,

First of all, thank you for reviewing the article. I appreciate the time you took and the constructive feedback you gave me to improve this work. 

The title was revised to “Selecting Differential Splicing Methods: Practical Considerations for Short-Read RNA Sequencing” to emphasise short-read platforms, and the abstract and introduction now explicitly state this focus. I am hesitant to expand and write a full section on long-read technology, as this isn't really my field of expertise. Therefore, we decided to streamline this section in line with Dr Capitanchik's recommendations while weaving in the hybridised approaches of current short-read technologies.

I agree with the minor points and have addressed these by including the recommended citations in the introduction and discussion. 

These changes align the manuscript with your recommendations, maintaining its comprehensive scope while clarifying its primary focus on short-read RNA-seq. 

Sincerely,

Ben J. Draper

University of Sheffield

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    Underlying data

    No data associated with this article.

    Extended data

    Zenodo: Selecting differential splicing methods: Practical considerations https://doi.org/10.5281/zenodo.14293573. 73

    The repository contains the following underlying data:

    • Supplementary Table 1.docx: Statistical details on differential splicing tools.

    • citations_2023.csv: WoS citation count for differential splicing tools.

    • citations_year_plot_new.R: R script to visualise citation trends.

    • github_repos_txt: Github repository locations cloned on 20.02.2024.

    • github_repos.R: Github maintenance analysis and visualisation.

    • citations_2023.xlsx


    Articles from F1000Research are provided here courtesy of F1000 Research Ltd

    RESOURCES