Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2020 Sep 8;16(9):e1008195. doi: 10.1371/journal.pcbi.1008195

VALERIE: Visual-based inspection of alternative splicing events at single-cell resolution

Wei Xiong Wen 1,2, Adam J Mead 1,3, Supat Thongjuea 2,3,*
Editor: Mihaela Pertea4
PMCID: PMC7500686  PMID: 32898151

Abstract

We present VALERIE (Visualising alternative splicing events from single-cell ribonucleic acid-sequencing experiments), an R package for visualising alternative splicing events at single-cell resolution. To explore any given specified genomic region, corresponding to an alternative splicing event, VALERIE generates an ensemble of informative plots to visualise cell-to-cell heterogeneity of alternative splicing profiles across single cells and performs statistical tests to compare percent spliced-in (PSI) values across the user-defined groups of cells. Among the features available, VALERIE displays PSI values, in lieu of read coverage, which is more suitable for representing alternative splicing profiles for a large number of samples typically generated by single-cell RNA-sequencing experiments. VALERIE is available on the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/VALERIE/index.html.


This is a PLOS Computational Biology Software paper.

Introduction

Technological advances in high-throughput next-generation transcriptomic sequencing have unravelled gene expression signatures underpinning physiological and pathological processes [13]. Alternative splicing represents an additional and underappreciated layer of complexity underlying gene expression profiles [46]. To date, alternative splicing has primarily been investigated using bulk RNA-sequencing. Alternative splicing analysis at single-cell resolution is an emerging area of research and it promises to provide novel biological insights previously missed by bulk RNA-sequencing [7, 8]. Notably, single-cell analysis showed differential isoform usage within apparently homogenous cell populations and revealed hidden subpopulation of cells that could not be distinguished by conventional approaches such as using cell-surface markers [811]. For example, single-cell analysis of mouse embryonic stem cells (ESCs) revealed differential variation in isoform usage reflecting changes in the dynamic state of cells such as the cell cycle [11].

To explore alternative splicing events, the visual-based inspection of sequencing read coverage in the genome browser is the usual practice prior or complementary to laboratory-based validation of gene or alternative splicing profiles, such as using quantitative polymerase chain reaction (RT-qPCR) or single-molecule fluorescence in situ hybridisation (smFISH). Existing genome browser visualisation tools are optimized for small-scale bulk RNA-sequencing datasets [1214]. Present approaches for visualising alternative splicing events in single cells include aggregating single cells by cell types [14, 15], presenting only a subset of single cells [16], or presenting all single cells in the study [8, 17]. The first and second approaches do not capture cell-to-cell heterogeneity in the entire population of cells, which is a key component of single-cell data, whereas the third approach is difficult to delineate overall alternative splicing patterns across different cell populations. Moreover, read coverage which is commonly employed to visualise gene expression profiles is not suitable for representing alternative splicing events. Read coverage distribution across a genomic locus is represented as bar graphs where the height of the bar graphs is proportional to the number of sequencing reads spanning across the genomic locus [13].

Percent spliced-in (PSI) is defined as the fraction of sequencing reads supporting the included isoform or exon and is more relevant for representing alternative splicing profiles [18, 19]. For example, a PSI of 0.50 (or 50%) means half of all sequencing reads support the included isoform or exon. In addition to read coverage, sashimi plots include arcs connecting two splice sites and the width of the arcs is proportional to the number of sequencing reads supporting the splice junctions [1214]. Nevertheless, the percentage, instead of the absolute number of sequencing reads, connecting two splice sites may be more intuitive to interpret the degree of exon inclusion.

An ideal tool for visualising single-cell alternative splicing data would incorporate a quantification of PSI across all individual single cells (not a single cell ensemble) with a statistical test to identify significant differences between cell populations. Here, we introduce VALERIE, a tool which incorporates these features, to complement existing next-generation sequencing visualisation platforms for inspecting cell-to-cell heterogeneity of alternative splicing events across different cell populations at single-cell resolution.

Design and implementation

In DNA-sequencing, split reads indicate genetic mutations, specifically deletions. In RNA-sequencing, split reads spanning a genomic locus indicate alternative splicing events such as exon-skipping. Therefore, split reads, in addition to read coverage, need to be taken into consideration for alternative splicing analysis and consequently visualisation of the events [2022]. VALERIE is implemented in R. It requires sorted binary alignment map (BAM) files with their corresponding index files, sample information file specifying BAM file names and user-defined cell groups for each single cell, and exon information file specifying the types of alternative splicing events and their corresponding genomic coordinates as the input. Single-cell groups can be known a priori using fluorescence-activated cell sorting (FACS) gating of cell-surface markers or can be assigned using unsupervised clustering approaches and gene signatures from RNA-sequencing data [9, 10]. Genomic coordinates corresponding to alternative splicing events can be extracted from the splicing detection and quantitation tools such as BRIE and MISO [12, 16]. VALERIE supports different types of alternative splicing events. These include skipped-exon (SE), mutually exclusive exons (MXE), retained-intron (RI), alternative 5’ splice site (A5SS), and alternative 3’ splice site (A3SS).

Implementations that interpret gapped alignments containing split reads are required because split reads are needed to infer degree of exon inclusion. In SAM/BAM format, split reads are characterised by reference skipping cigar operations ("N"). Hence, implementations which are able to identify these features are required to infer alternative splicing events. To this end, we implemented the function ‘readGAlignments’ [23] to take gapped alignments into account when reading BAM files. To enable fast and efficient reading of BAM files, we implemented the function ‘ScanBamParam’ based on the GenomicRanges [23] to selectively read regions corresponding to the input information of alternative splicing events as defined in the user-supplied exon information file. For each single cell, PSI values at each genomic coordinate supported by at least 10 reads are then computed as the proportion of non-split reads over the total number of reads, where the total number of reads is the sum of non-split reads and split reads [24, 25]. PSI values at each genomic coordinate across alternative exon and its constitutive exons for all single cells are represented in a heatmap plot. A line graph summarises the PSI values across all single cells at each genomic coordinate across user-defined groups of single cells by using the mean as the summary statistic. Significant differences of PSI values at each genomic coordinate between user-defined groups of single cells can be assessed using the t-test and Wilcoxon rank-sum test for two-group comparison. Analysis of variance (ANOVA) and Kruskal-Wallis test can be performed for more than two-group comparison. P-values can be corrected for multiple testing using the ‘p.adjust’ function options. Mean PSI and p-value line graphs are implemented using ggplot2 package. The workflow of data processing steps is described in S1A Fig. For each alternative splicing event, VALERIE generates an image consisting of three graphical components as shown in Fig 1, i.e. heatmap of PSI values from each single cell classified by a user-defined group of single cells, and mean PSI and p-value line graphs along genomic coordinates per event for groups of single cells.

Fig 1. Read coverage and percent spliced-in (PSI) profile of PKM mutually exclusive exons 9 and 10 in single cells from 63 induced pluripotent stem cells (iPSCs), 69 motor neuron cells (MNs), and 73 neural progenitor cells (NPC) [8].

Fig 1

(A) Integrative Genome Browser (IGV) presentation of read coverage from bulk RNA-sequencing data. MNs showed different read coverage at exons 9 and 10 across replicates whereas NPCs and iPSCs showed consistently higher read coverage at exon 10 compared to exon 9 across all replicates. (B) IGV sashimi plot of selected single-cell RNA-sequencing data. Heterogeneity observed in relative read coverage at exons 9 and 10 across single cells with some single cells with low-to-no coverage. (C-E) VALERIE presentation of PSI values from entire single-cell RNA-sequencing data. (C) Heatmap unravelled 2 subpopulations of MNs and NPCs whereby one population exclusively expressed exon 9 while another subpopulation exclusively expressed exon 10, whereas iPSCs consist of a single homogenous population that exclusively expressed exon 10. (D) Mean PSI values across the genomic coordinates corresponding to the flanking constitutive exons and mutually exclusive exons. Overall, iPSCs showed decreased usage of exon 10 and increased usage of exon 9 after differentiating into MNs or NPCs. MNs showed similar usage of exons 9 and 10 whereas NPCs showed higher usage of exon 10 compared to exon 9. Shaded regions represent 95% confidence interval (CI) of the mean. (E) Differences in mean PSI values across iPSCs, MNs, and NPCs, were statistically significant at the genomic coordinates corresponding to alternative splicing event (mutually exclusive exons) but were, as expected, not statistically significant at the genomic coordinates corresponding to the flanking constitutive exons. P-values were computed using Kruskal-Wallis test and adjusted for multiple testing using Bonferroni correction. The red dashed line indicates −log10 of the p-value of 0.05. Colour bar indicates scaled PSI values (z-scores) across rows (single cells). Grey regions in the heatmap indicate genomic positions with less than 10x coverage. Alt. exon: Alternatively spliced exon. Cons. exon: Constitutive exon. MXE: Mutually exclusive exons.

VALERIE enables visual inspection of alternative splicing events identified from the genome-wide differential analysis of different groups of single cells. Candidate events can be visualised simultaneously. VALERIE may serve as a complementary, or an alternative, approach to validate alternative splicing events identified from single-cell short-read RNA-sequencing such as RT-qPCR and smFISH. Verified alternatively splicing events can subsequently be prioritised for downstream functional studies (S1B Fig).

Features

VALERIE provides several unique features to complement existing visualisation platforms for inspecting alternative splicing events at single-cell resolution:

  1. Displays PSI values for single cells instead of read coverage. PSI value is more suitable for representing alternative splicing profiles whereas read coverage is more suitable for representing gene expression profiles. It is widely reported that changes in alternative splicing profiles (differential isoform usage) were not always accompanied by changes in gene expression profiles [11, 26, 27]. For example, differential isoform usage analysis revealed differential expression of CD45 between CD45RO+ memory and CD45RA+ naïve T cell populations. However, analysis using gene counts did not detect differential CD45 usage between these two cell populations [27].

  2. Displays PSI values at each genomic coordinate along the alternate exon and flanking constitutive exons. This is in contrast to current approaches that limit splice information to exon-exon junctions [13, 14]. Presenting PSI values at each genomic coordinate enables consistency of PSI profile across the entire exon length to be assessed. This is relevant for sequencing approaches that utilise very short reads, e.g. 50bp single-end reads that do not span the entire exon length (S2 Fig) [9].

  3. Summarises PSI profiles for user-defined groups of single cells.

  4. Performs statistical tests to assess significant differences in PSI profiles between user-defined groups of single cells.

  5. Omits non-informative intronic regions, except in cases involving intron-retention alternative splicing events.

  6. Annotates and standardises relative positions of alternative and constitutive exons in 5’-to-3’ transcription direction.

Results

To enable validation of VALERIE, we identified high-quality single-cell alternative splicing events that have been validated using RT-qPCR and smFISH. We included single-cell RNA-sequencing dataset from 63 induced pluripotent stem cells (iPSCs), 69 motor neuron cells (MNs), and 73 neural progenitor cells (NPCs) from a previous study [8]. In this study, bulk RNA-sequencing data for each cell type were also available. Thus, we were able to compare visualisation of alternative splicing events at both single-cell and bulk levels using VALERIE and existing visualisation platforms. The pyruvate kinase M1/2 (PKM) gene produces two main transcripts PKM1 and PKM2 which are differentially expressed when iPSCs differentiate into MNs or NPCs. Specifically, iPSCs exclusively express exon 10 whereas MNs and NPCs consist of two subpopulations of cells that either express exon 9 or 10. Hence, the type of alternative splicing event PKM undergoes is mutually exclusive exons (MXE). Read coverage presentation of bulk RNA-sequencing data demonstrated exclusive exon 10 expression in iPSCs but simultaneous exon 9 and 10 expression in MNs and NPCs (Fig 1A). Sashimi plots of selected single cells unravelled cell-to-cell heterogeneity of exon usage between the different cell types but were unable to capture differential exon usage across all cells (Fig 1B). Moreover, comparison of separate plots for all cells may become overwhelming and ultimately aggregating single cells by cell types becomes necessary, but this will obscure cell-to-cell heterogeneity [14]. VALERIE captured cell-to-cell heterogeneity across all single cells and also enabled comparison of overall alternative splicing profiles across the different cell populations (Fig 1C–1E).

We used the STAR splice-aware aligner to map sequencing reads to the human reference genome [28]. STAR is the popular choice among published single-cell alternative splicing studies [8, 9, 15, 24]. HISAT2 (hierarchical indexing for spliced alignment of transcripts) is a splice-aware aligner with superior alignment speed compared to other aligners [29]. To evaluate the PSI profile generated by different aligners, we performed analysis on the same dataset using HISAT2. PKM mutually exclusive exons 9 and 10 similarly showed differential splicing across iPSCs, MNs, and NPCs (S3 Fig), albeit with a slightly higher level of mean PSI values and higher level of statistical significance (lower p-values) between cell types compared to the alignment using STAR. This is may be in part attributed to the differences in sensitivity and specificity between the two aligners [29]. We further subsampled aligned reads for each single cell to 50%, 25%, and 1% of the original read depth in order to simulate PSI profiles at varying read depths and gene expression levels. As expected, we observed higher power to detect differential PKM mutually exclusive exons 9 and 10 usage at higher read depth as reflected by the clearer signal of mean PSI and higher significant level at genomic coordinates corresponding to the alternatively spliced exons (S4 Fig). We investigated the PSI profiles for alternative splicing events at different positions, 3’-end, 5’-end, and in the middle of the transcript. We identified a representative alternative splicing event from each category from the same study that was validated using single-cell qPCR [8]. All alternatively spliced exons were observed to be differentially spliced across iPSCs, MNs, and NPCs (S5 Fig). Notably, constitutive exons which were either the last or first exon of the transcript showed varying coverage at the 3’-end and 5’-end, respectively (S5A and S5B Fig). Similar observations were reported previously from end-to-end sequencing of entire isoforms in single cells using long-read RNA-sequencing such as Nanopore and PacBio [3032]. Interestingly, VALERIE revealed an alternative 3’ splice site (A3SS) on the 5’ constitutive exon (exon 5) based on changes in PSI values (S5C Fig). Though this A3SS usage was not significantly different across the different cell types.

We measured the computational time to process single cells from investigating alternative splicing dynamics in mouse oligodendrocytes [9]. VALERIE required an average of 27.3s to process 2,000 single cells based on 10 repeated evaluations using the microbenchmark package. This evaluation was performed on an iMac with 3.5 GHz Quad-Core Intel Core i5 processer and 32 GB memory.

Availability and future directions

Recent advances in full-length library preparation protocols enabled amplification and subsequently sequencing of small amount of starting RNA materials such as that from single cells [33, 34]. Current visualisation platforms are optimised for visualising gene and alternative splicing profiles for small-scale bulk RNA-sequencing datasets [1214]. VALERIE complements existing implementations by enabling visualisation of alternative splicing events for single-cell RNA-sequencing datasets typically generated by full-length library preparation methods such as Smart-seq2 [34] and it is not appropriate for high-throughput droplet-based platforms such as the Chromium 10x genomics. It would be of particular interest to extend VALERIE’s functionality to include visualisation of alternative splicing events from single-cell long-read RNA-sequencing datasets. VALERIE is available on the Comprehensive R Archive Network CRAN (https://cran.r-project.org/web/packages/VALERIE/index.html).

Supporting information

S1 Fig. Overview of VALERIE.

(A) The workflow of data processing steps. VALERIE computes percent spliced-in (PSI) values from read coverage information and integrates alternative splicing coordinates and cell group annotations from exon and sample information files to generate heatmap of PSI, and line graphs of mean PSI and adjusted p-values at each nucleotide position. (B) The role of VALERIE in the overall process of identifying candidate alternative splicing events for downstream functional studies. VALERIE serves as a visual inspection and validation of alternative splicing events identified from genome-wide analysis such as differential analysis. In conjunction with, or as an alternative to, other technical validation approaches as such sc-qPCR, VALERIE can enable selection of candidate alternative splicing events for downstream functional validation. A3SS: Alternative 3’ splice site. A5SS: Alternative 5’ splice site. MXE: Mutually exclusive exons. RI: Retained-intron. sc-qPCR: Single-cell quantitative polymerase chain reaction. SE: Skipped-exon. smFISH: Single-molecule fluorescence in situ hybridisation.

(TIFF)

S2 Fig. Percent spliced-in (PSI) profile of Mbp exon 2 skipping in 46 single cells from mice induced with experimental autoimmune encephalomyelitis (EAE) and 57 single cells from healthy mice [9].

Top: Sparse profile of PSI values due to long exon lengths relative to sequencing reads. Here, exon 1, 2, and 3 are 171, 78, and 102 base-pair (bp) in length whereas libraries were sequenced in 50bp single-end mode. Middle: Mean PSI values across the genomic coordinates corresponding to the flanking constitutive exons and skipped-exon. Overall, single cells from EAE mice showed increased exon 2 usage compared to single cells from control mice. Bottom: Differences in mean PSI values across EAE and control cell groups were statistically significant at the genomic coordinates corresponding to alternative splicing event (skipped-exon) but were, as expected, not statistically significant at the genomic coordinates corresponding to the flanking constitutive exons. P-values were computed using Wilcoxon rank-sum test and adjusted for multiple testing using Bonferroni correction. The red dashed line indicates −log10 of the p-value of 0.05. Colour bar indicates scaled PSI values (z-scores) across rows (single cells). Grey regions in the heatmap indicate genomic positions with less than 10x coverage. Alt. exon: Alternatively spliced exon. Cons. exon: Constitutive exon. SE: Skipped-exon.

(TIFF)

S3 Fig. Percent spliced-in (PSI) profile of PKM mutually exclusive exons 9 and 10 in single cells from 63 induced pluripotent stem cells (iPSCs), 69 motor neuron cells (MNs), and 73 neural progenitor cells (NPC).

Sequencing reads were aligned using HISAT2 [29] in lieu of STAR [28] as in Fig 1. Alignment with HISAT2 similarly showed significant differential PKM mutually exclusive exon usage across the three cell populations. P-values were computed using Kruskal-Wallis test and adjusted for multiple testing using Bonferroni correction. The red dashed line indicates −log10 of the p-value of 0.05. Colour bar indicates scaled PSI values (z-scores) across rows (single cells). Grey regions in the heatmap indicate genomic positions with less than 10x coverage. Alt. exon: Alternatively spliced exon. Cons. exon: Constitutive exon. MXE: Mutually exclusive exons.

(TIFF)

S4 Fig. Percent spliced-in (PSI) profile of PKM mutually exclusive exons 9 and 10 in single cells from 63 induced pluripotent stem cells (iPSCs), 69 motor neuron cells (MNs), and 73 neural progenitor cells (NPC) at different read depth.

Aligned sequencing reads were subsampled to yield read depth of (A) 50%, (B) 25%, and (C) 1% of the original read depth to simulate PSI profile at different read depth. P-values were computed using Kruskal-Wallis test and adjusted for multiple testing using false discovery rate (FDR). The red dashed line indicates −log10 of the p-value of 0.05. Colour bar indicates scaled PSI values (z-scores) across rows (single cells). Grey regions in the heatmap indicate genomic positions with less than 10x coverage. Alt. exon: Alternatively spliced exon. Cons. exon: Constitutive exon. MXE: Mutually exclusive exons.

(TIFF)

S5 Fig. Percent spliced-in (PSI) profiles of alternatively spliced exons located at 3’-end, 5’-end, and in the middle of the transcript.

(A) RPS24 alternatively spliced exon positioned at the 3’-end (2nd last exon) of the ENST00000435275.5_4 transcript consisting 6 exons in length. (B) RBPJ alternatively spliced exon positioned at the 5’-end (2nd exon) of the ENST00000355476.7_2 transcript consisting 12 exons in length. (C) DYNC1I2 alternatively spliced exon positioned in the middle (6th exon) of the ENST00000355476.7_2 transcript consisting 18 exons in length. Differences in PSI values on the DYNC1I2 5’ constitutive exon (exon 5) revealed an alternative 3’ splice site (A3SS) located on the exon. This A3SS is annotated in GENCODE. Transcript IDs correspond to GENCODE v34lift37. P-values were computed using Kruskal-Wallis test and adjusted for multiple testing using Bonferroni correction. The red dashed line indicates −log10 of the p-value of 0.05. Colour bar indicates scaled PSI values (z-scores) across rows (single cells). Grey regions in the heatmap indicate genomic positions with less than 10x coverage. Alt. exon: Alternatively spliced exon. Cons. exon: Constitutive exon. SE: Skipped-exon.

(TIFF)

S1 File. VALERIE source code, documentation, and test data.

(GZ)

Acknowledgments

The authors would like to thank the members of WIMM Centre for Computational Biology for testing the VALERIE package and providing useful input.

Data Availability

All relevant data are within the manuscript and its Supporting Information files. The software is available on the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/VALERIE/index.html.

Funding Statement

The Clarendon Fund and Oxford-Radcliffe Scholarship in conjunction with WIMM Prize PhD Studentship to W.X.W., Medical Research Council (MRC) Senior Clinical Fellowship and CRUK Senior Cancer Research Fellowship to A.J.M., and Oxford-Bristol Myers Squibb (BMS) Fellowship to S.T. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Cancer Genome Atlas Research N, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45(10):1113–20. Epub 2013/09/28. 10.1038/ng.2764 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.International Cancer Genome C, Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, et al. International network of cancer genome projects. Nature. 2010;464(7291):993–8. Epub 2010/04/16. 10.1038/nature08987 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wen WX, Leong CO. Association of BRCA1- and BRCA2-deficiency with mutation burden, expression of PD-L1/PD-1, immune infiltrates, and T cell-inflamed signature in breast cancer. PLoS One. 2019;14(4):e0215381 Epub 2019/04/26. 10.1371/journal.pone.0215381 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Smith MA, Choudhary GS, Pellagatti A, Choi K, Bolanos LC, Bhagat TD, et al. U2AF1 mutations induce oncogenic IRAK4 isoforms and activate innate immune pathways in myeloid malignancies. Nat Cell Biol. 2019;21(5):640–50. Epub 2019/04/24. 10.1038/s41556-019-0314-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chen M, Manley JL. Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches. Nat Rev Mol Cell Biol. 2009;10(11):741–54. Epub 2009/09/24. 10.1038/nrm2777 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wen WX, Mead AJ, Thongjuea S. Technological advances and computational approaches for alternative splicing analysis in single cells. Comput Struct Biotechnol J. 2020;18:332–43. Epub 2020/02/27. 10.1016/j.csbj.2020.01.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Arzalluz-Luque A, Conesa A. Single-cell RNAseq for the study of isoforms-how is that possible? Genome Biol. 2018;19(1):110 Epub 2018/08/12. 10.1186/s13059-018-1496-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Song Y, Botvinnik OB, Lovci MT, Kakaradov B, Liu P, Xu JL, et al. Single-Cell Alternative Splicing Analysis with Expedition Reveals Splicing Dynamics during Neuron Differentiation. Mol Cell. 2017;67(1):148–61 e5. Epub 2017/07/05. 10.1016/j.molcel.2017.06.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Falcao AM, van Bruggen D, Marques S, Meijer M, Jakel S, Agirre E, et al. Disease-specific oligodendrocyte lineage cells arise in multiple sclerosis. Nat Med. 2018;24(12):1837–44. Epub 2018/11/14. 10.1038/s41591-018-0236-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Shalek AK, Satija R, Adiconis X, Gertner RS, Gaublomme JT, Raychowdhury R, et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature. 2013;498(7453):236–40. Epub 2013/05/21. 10.1038/nature12172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Welch JD, Hu Y, Prins JF. Robust detection of alternative splicing in a population of single cells. Nucleic Acids Res. 2016;44(8):e73 Epub 2016/01/08. 10.1093/nar/gkv1525 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods. 2010;7(12):1009–15. Epub 2010/11/09. 10.1038/nmeth.1528 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6. Epub 2011/01/12. 10.1038/nbt.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Garrido-Martin D, Palumbo E, Guigo R, Breschi A. ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization. PLoS Comput Biol. 2018;14(8):e1006360 Epub 2018/08/18. 10.1371/journal.pcbi.1006360 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Manipur I, Granata I, Guarracino MR. Exploiting single-cell RNA sequencing data to link alternative splicing and cancer heterogeneity: A computational approach. Int J Biochem Cell Biol. 2019;108:51–60. Epub 2019/01/12. 10.1016/j.biocel.2018.12.015 . [DOI] [PubMed] [Google Scholar]
  • 16.Huang Y, Sanguinetti G. BRIE: transcriptome-wide splicing quantification in single cells. Genome Biol. 2017;18(1):123 Epub 2017/06/29. 10.1186/s13059-017-1248-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Munoz JF, Delorey T, Ford CB, Li BY, Thompson DA, Rao RP, et al. Coordinated host-pathogen transcriptional dynamics revealed using sorted subpopulations and single macrophages infected with Candida albicans. Nat Commun. 2019;10(1):1607 Epub 2019/04/10. 10.1038/s41467-019-09599-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Shiozawa Y, Malcovati L, Galli A, Sato-Otsubo A, Kataoka K, Sato Y, et al. Aberrant splicing and defective mRNA production induced by somatic spliceosome mutations in myelodysplasia. Nat Commun. 2018;9(1):3649 Epub 2018/09/09. 10.1038/s41467-018-06063-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Schischlik F, Jager R, Rosebrock F, Hug E, Schuster M, Holly R, et al. Mutational landscape of the transcriptome offers putative targets for immunotherapy of myeloproliferative neoplasms. Blood. 2019;134(2):199–210. Epub 2019/05/09. 10.1182/blood.2019000519 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Mapleson D, Venturini L, Kaithakottil G, Swarbreck D. Efficient and accurate detection of splice junctions from RNA-seq with Portcullis. Gigascience. 2018;7(12). Epub 2018/11/13. 10.1093/gigascience/giy131 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Nellore A, Jaffe AE, Fortin JP, Alquicira-Hernandez J, Collado-Torres L, Wang S, et al. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Genome Biol. 2016;17(1):266 Epub 2017/01/01. 10.1186/s13059-016-1118-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wang L, Xi Y, Yu J, Dong L, Yen L, Li W. A statistical method for the detection of alternative splicing using RNA-seq. PLoS One. 2010;5(1):e8529 Epub 2010/01/15. 10.1371/journal.pone.0008529 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lawrence M, Huber W, Pages H, Aboyoun P, Carlson M, Gentleman R, et al. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9(8):e1003118 Epub 2013/08/21. 10.1371/journal.pcbi.1003118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Linker SM, Urban L, Clark SJ, Chhatriwala M, Amatya S, McCarthy DJ, et al. Combined single-cell profiling of expression and DNA methylation reveals splicing regulation and heterogeneity. Genome Biol. 2019;20(1):30 Epub 2019/02/13. 10.1186/s13059-019-1644-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Liu W, Zhang X. Single-cell alternative splicing analysis reveals dominance of single transcript variant. Genomics. 2020;112(3):2418–25. Epub 2020/01/26. 10.1016/j.ygeno.2020.01.014 . [DOI] [PubMed] [Google Scholar]
  • 26.Vu TN, Wills QF, Kalari KR, Niu N, Wang L, Pawitan Y, et al. Isoform-level gene expression patterns in single-cell RNA-sequencing data. Bioinformatics. 2018;34(14):2392–400. Epub 2018/03/01. 10.1093/bioinformatics/bty100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ntranos V, Yi L, Melsted P, Pachter L. A discriminative learning approach to differential expression analysis for single-cell RNA-seq. Nat Methods. 2019;16(2):163–6. Epub 2019/01/22. 10.1038/s41592-018-0303-9 . [DOI] [PubMed] [Google Scholar]
  • 28.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. Epub 2012/10/30. 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357–60. Epub 2015/03/10. 10.1038/nmeth.3317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Legnini I, Alles J, Karaiskos N, Ayoub S, Rajewsky N. FLAM-seq: full-length mRNA sequencing reveals principles of poly(A) tail length control. Nat Methods. 2019;16(9):879–86. Epub 2019/08/07. 10.1038/s41592-019-0503-y . [DOI] [PubMed] [Google Scholar]
  • 31.Byrne A, Beaudin AE, Olsen HE, Jain M, Cole C, Palmer T, et al. Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nat Commun. 2017;8:16027 Epub 2017/07/20. 10.1038/ncomms16027 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gupta I, Collier PG, Haase B, Mahfouz A, Joglekar A, Floyd T, et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat Biotechnol. 2018. Epub 2018/10/16. 10.1038/nbt.4259 . [DOI] [PubMed] [Google Scholar]
  • 33.Ramskold D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol. 2012;30(8):777–82. Epub 2012/07/24. 10.1038/nbt.2282 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Picelli S, Bjorklund AK, Faridani OR, Sagasser S, Winberg G, Sandberg R. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods. 2013;10(11):1096–8. Epub 2013/09/24. 10.1038/nmeth.2639 . [DOI] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008195.r001

Decision Letter 0

Mihaela Pertea

23 May 2020

Dear Dr. thongjuea,

Thank you very much for submitting your manuscript "VALERIE: Visual-based inspection of alternative splicing events at single-cell resolution" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. 

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Mihaela Pertea

Software Editor

PLOS Computational Biology

Mihaela Pertea

Software Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: In this manuscript, Wen and colleagues describe a new tool for the visualization of alternative splicing events at single-cell resolution. They have named it VALERIE and they have made it available as an R package on CRAN. The authors have provided an example of the ability of VALERIE (compared to other platforms) to visualize an alternative splicing event at both single-cell and bulk levels using RNA sequencing data from the study by Song et al (Mol Cell 2017) on induced pluripotent stem cells, motor neuron cells and neural progenitor cells. Overall, VALERIE has several useful features that complement existing visualization platforms and is a valuable addition to the relatively small toolbox currently available for the visualization of alternative splicing events from RNA-seq data.

Comments

- The authors have demonstrated the ability of VALERIE to visualize alternative splicing events and perform statistical tests using a single example of mutually exclusive exons (MXE) in the PKM gene in iPSCs and neuron cells. It would be interesting to see the visualization of at least another alternative splicing event (preferably a different type from MXE) either from the same study or from a different suitable study. Another example is shown in Figure S1, but this was specifically presented for data including very short reads.

- Line 82: The authors should specify which panel of Figure 1 refers to the image generated by VALERIE.

- In Figure 1 and Figure S1, it would be helpful to add a short text describing what the heatmap shows (i.e. PSI values) next to the colorbar.

- In the Introduction, the syntax in line 10 should be corrected.

Reviewer #2: The article by Wei Xiong Wen describes a novel tool for visualisation of alternative splicing events in single-cell RNA sequencing data.

Alternative splicing is an important mechanism for the generation of proteomic diversity from the genome, and is critically overlooked in most single-cell RNA-sequencing studies. Tools that assist in the detection and visualisation of alternative splicing events from single-cell data are therefore, very welcome.

The VALERIE R package enables visualisation of percent spliced in (PSI) in lieu of read coverage to enable visualisation of a number of single-cell samples in parallel, using BAM files as an input.

The paper is concise, and the tool itself is useful – I have a number of relatively minor comments I would like to see addressed in a final publication.

1) How can this tool be used in a discovery process? The authors show that it can potentially resolve alternative splicing events in individual genes, but it would be important that the authors demonstrate how the tool can be used in conjunction with others to identify alternative splicing events from a large dataset and rank them on the basis of supporting evidence. A graphical overview of the data processing would be helpful.

2) It must be made clearer that this method appropriate primarily for plate-based methods such as Smart-seq2 which capture full-length sequence (although this is fragmented for sequencing) and not appropriate for e.g. 10x genomics libraries.

3) Is there a hypothetical maximum number of cells the tool can cope with?

4) Are there any mapping constraints for the method – i.e. are particular aligners more favourable than others. Is there any other pre-processing that might affect the outcome of the analysis?

5) Similarly it would be essential to reflect on any coverage constraints for detection – e.g. how does the tool fare with genes at different expression levels (perhaps percentiles of overall expression levels) or indeed genes for which splicing events are located at the 3’, mid or 5’ of the transcript. I understand that some limitations here will be due to the biology/molecular biology preceding this analysis but it is important for potential users of the tool to understand potential limitations when thinking about this kind of analysis.

6) The main figure is quite basic and cluttered in appearance, panels C should be split into C, D and E and some additional explanation in the figure legend.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008195.r003

Decision Letter 1

Mihaela Pertea

26 Jul 2020

Dear Dr. thongjuea,

We are pleased to inform you that your manuscript 'VALERIE: Visual-based inspection of alternative splicing events at single-cell resolution' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Mihaela Pertea

Software Editor

PLOS Computational Biology

Mihaela Pertea

Software Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have addressed all my comments. I have no further comments.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008195.r004

Acceptance letter

Mihaela Pertea

28 Aug 2020

PCOMPBIOL-D-20-00566R1

VALERIE: Visual-based inspection of alternative splicing events at single-cell resolution

Dear Dr thongjuea,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Laura Mallard

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Overview of VALERIE.

    (A) The workflow of data processing steps. VALERIE computes percent spliced-in (PSI) values from read coverage information and integrates alternative splicing coordinates and cell group annotations from exon and sample information files to generate heatmap of PSI, and line graphs of mean PSI and adjusted p-values at each nucleotide position. (B) The role of VALERIE in the overall process of identifying candidate alternative splicing events for downstream functional studies. VALERIE serves as a visual inspection and validation of alternative splicing events identified from genome-wide analysis such as differential analysis. In conjunction with, or as an alternative to, other technical validation approaches as such sc-qPCR, VALERIE can enable selection of candidate alternative splicing events for downstream functional validation. A3SS: Alternative 3’ splice site. A5SS: Alternative 5’ splice site. MXE: Mutually exclusive exons. RI: Retained-intron. sc-qPCR: Single-cell quantitative polymerase chain reaction. SE: Skipped-exon. smFISH: Single-molecule fluorescence in situ hybridisation.

    (TIFF)

    S2 Fig. Percent spliced-in (PSI) profile of Mbp exon 2 skipping in 46 single cells from mice induced with experimental autoimmune encephalomyelitis (EAE) and 57 single cells from healthy mice [9].

    Top: Sparse profile of PSI values due to long exon lengths relative to sequencing reads. Here, exon 1, 2, and 3 are 171, 78, and 102 base-pair (bp) in length whereas libraries were sequenced in 50bp single-end mode. Middle: Mean PSI values across the genomic coordinates corresponding to the flanking constitutive exons and skipped-exon. Overall, single cells from EAE mice showed increased exon 2 usage compared to single cells from control mice. Bottom: Differences in mean PSI values across EAE and control cell groups were statistically significant at the genomic coordinates corresponding to alternative splicing event (skipped-exon) but were, as expected, not statistically significant at the genomic coordinates corresponding to the flanking constitutive exons. P-values were computed using Wilcoxon rank-sum test and adjusted for multiple testing using Bonferroni correction. The red dashed line indicates −log10 of the p-value of 0.05. Colour bar indicates scaled PSI values (z-scores) across rows (single cells). Grey regions in the heatmap indicate genomic positions with less than 10x coverage. Alt. exon: Alternatively spliced exon. Cons. exon: Constitutive exon. SE: Skipped-exon.

    (TIFF)

    S3 Fig. Percent spliced-in (PSI) profile of PKM mutually exclusive exons 9 and 10 in single cells from 63 induced pluripotent stem cells (iPSCs), 69 motor neuron cells (MNs), and 73 neural progenitor cells (NPC).

    Sequencing reads were aligned using HISAT2 [29] in lieu of STAR [28] as in Fig 1. Alignment with HISAT2 similarly showed significant differential PKM mutually exclusive exon usage across the three cell populations. P-values were computed using Kruskal-Wallis test and adjusted for multiple testing using Bonferroni correction. The red dashed line indicates −log10 of the p-value of 0.05. Colour bar indicates scaled PSI values (z-scores) across rows (single cells). Grey regions in the heatmap indicate genomic positions with less than 10x coverage. Alt. exon: Alternatively spliced exon. Cons. exon: Constitutive exon. MXE: Mutually exclusive exons.

    (TIFF)

    S4 Fig. Percent spliced-in (PSI) profile of PKM mutually exclusive exons 9 and 10 in single cells from 63 induced pluripotent stem cells (iPSCs), 69 motor neuron cells (MNs), and 73 neural progenitor cells (NPC) at different read depth.

    Aligned sequencing reads were subsampled to yield read depth of (A) 50%, (B) 25%, and (C) 1% of the original read depth to simulate PSI profile at different read depth. P-values were computed using Kruskal-Wallis test and adjusted for multiple testing using false discovery rate (FDR). The red dashed line indicates −log10 of the p-value of 0.05. Colour bar indicates scaled PSI values (z-scores) across rows (single cells). Grey regions in the heatmap indicate genomic positions with less than 10x coverage. Alt. exon: Alternatively spliced exon. Cons. exon: Constitutive exon. MXE: Mutually exclusive exons.

    (TIFF)

    S5 Fig. Percent spliced-in (PSI) profiles of alternatively spliced exons located at 3’-end, 5’-end, and in the middle of the transcript.

    (A) RPS24 alternatively spliced exon positioned at the 3’-end (2nd last exon) of the ENST00000435275.5_4 transcript consisting 6 exons in length. (B) RBPJ alternatively spliced exon positioned at the 5’-end (2nd exon) of the ENST00000355476.7_2 transcript consisting 12 exons in length. (C) DYNC1I2 alternatively spliced exon positioned in the middle (6th exon) of the ENST00000355476.7_2 transcript consisting 18 exons in length. Differences in PSI values on the DYNC1I2 5’ constitutive exon (exon 5) revealed an alternative 3’ splice site (A3SS) located on the exon. This A3SS is annotated in GENCODE. Transcript IDs correspond to GENCODE v34lift37. P-values were computed using Kruskal-Wallis test and adjusted for multiple testing using Bonferroni correction. The red dashed line indicates −log10 of the p-value of 0.05. Colour bar indicates scaled PSI values (z-scores) across rows (single cells). Grey regions in the heatmap indicate genomic positions with less than 10x coverage. Alt. exon: Alternatively spliced exon. Cons. exon: Constitutive exon. SE: Skipped-exon.

    (TIFF)

    S1 File. VALERIE source code, documentation, and test data.

    (GZ)

    Attachment

    Submitted filename: WenWX_VALERIE_PLoS Comput Biol_response_to_reviewers.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files. The software is available on the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/VALERIE/index.html.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES