Skip to main content
. 2020 Jun 29;9:e54603. doi: 10.7554/eLife.54603

Figure 4. Accounting for coverage biases reveals unimodal splicing distributions and differential splicing.

(a) Estimated total mRNA molecules captured per cell. (b) Estimated number of recovered mRNAs vs splice junction reads for cassette exons, averaged across cells. Each dot corresponds to an exon, and its color indicates the proportion of cells in which it has a binary observation (only one isoform observed). We analyzed exons with average Ψ^ between 0.05 and 0.95. (c) Per-cell splice junction coverage rate in each dataset. (d) Cadm1 exon 8 alternative splicing appears binary in many cells in the Chen dataset. Correlation with lineage pseudotime: Spearman’s rs = 0.1. (e) Cadm1 exon after removing cells with fewer than 10 recovered Cadm1 mRNA molecules and fewer splice junction reads than expected from 10 mRNAs (grey). Spearman’s rs = 0.52. (f) PCA projection and clustering of single cells in the Chen dataset, showing differentiation of mouse ES cells into neurons. Red line, lineage inferred with Slingshot. (g) Number of cassette exons with observations from at least 10 mRNA reads in at least 50% of cells in any cluster. (h) Stacked histograms showing the distribution of observed Ψ^ of exons as in (g), in each cell cluster of the Chen dataset. Observations with fewer than 10 mRNA molecules were removed. We show exons with average Ψ^ ranging from 0.1 to 0.9 per cluster. (i) QQ-plot comparing the quantiles of a uniform distribution (x-axis) with the quantiles of the distributions of p-values from the Kruskal-Wallis test (y-axis). A diagonal line (gray dotted line) would mean the p-values are uniformly distributed. A lower area under the curve indicates enrichment for low p-values. The point on the x-axis at which each line crosses the dotted red line indicates the proportion of p-values that are below 0.05 in the distribution. (j) Fold enrichment of exons with a Kruskal-Wallis p < x in the set of exons selected with the mRNA-based filter (blue), and exons selected with a flat read minimum filter (red). (k) Significance p-value of the enrichment, estimated with the hypergeometric test and adjusted for FDR. (l) Example exons that pass the overall filter criteria in the Chen dataset and have p<0.05 in the Kruskal-Wallis test.

Figure 4.

Figure 4—figure supplement 1. Relationship between read coverage, captured mRNAs, and binarity in single cell datasets.

Figure 4—figure supplement 1.

(a) Average number of splice junction reads required to have a 50% likelihood of observing both isoforms, from simulations in Figure 3j. (b) Total mRNAs required to have a 50% likelihood of observing both isoforms, from simulations in Figure 3k. (c) Total mapped reads per cell for each dataset. (d) Estimated average number of recovered mRNAs vs. average number of splice junction reads for each exon. (e) Expression of pluripotency and neuron differentiation marker genes in the Chen dataset; expression is scaled by each row’s maximum and minimum. Cells are ordered by inferred pseudotime. Top color key, clusters from agglomerative clustering; bottom color key, labels from Chen et al. The red box highlights two sub-groups of neurons identified by clustering. (f) Visualization of the distribution of Ψ^ observations toward extreme values in the first cluster of cells of each dataset. For each x cutoff in the x-axis, the y-axis corresponds to the proportion of cells that have Ψ^x or Ψ^(1-x). Blue lines, exons passing the mRNA-based filter; red lines, exons removed by filter. (g) Stacked histograms, as in Figure 4h, for other datasets. The Fletcher dataset is excluded because only five exons pass the mRNA filter. (h) Stacked histograms of exons that do not pass the filter in the Chen dataset; exons were subsampled to match the number that do pass the filter in each cell type cluster. (i) Sex-dependent distribution of Ψ^ of the two bimodal exons of the Chen dataset after cell-type stratification: Smarcad1 exon 3 (chr6: 65043836–65044108) in early Epi cells, Nsfl1c exon 4 (chr2: 151502455–151502460) in late Epi cells.
Figure 4—figure supplement 2. Analysis of differential splicing among selected exons with the Kruskal-Wallis analysis of variance.

Figure 4—figure supplement 2.

(a) Comparison of the precision of selecting exons with low p-values, for the mRNA-based filter versus selecting exons on a flat minimum of 10 reads. (b) Recall. (c) F1-score. (d) Specificity. (e) Accuracy. (f) Number of exons with px in each dataset.
Figure 4—figure supplement 3. Analysis of differential splicing among selected exons with the autocorrelation test.

Figure 4—figure supplement 3.

(a) Number of exons with p x in each dataset. (b) QQ-plot comparing the quantiles of a uniform distribution, with the quantiles of the distributions of p-values from the autocorrelation test. (c) Fold enrichment of exons with an autocorrelation p < x in the set of exons selected with the mRNA-based filter (blue), and exons selected with a flat read minimum filter (red). (d) Significance p-value of the enrichment shown in (c). For each p-value limit x, we estimate the significance with the hypergeometric test, and correct for multiple testing using the Benjamini-Hochberg FDR adjustment. (e) F1-score. (f) Specificity. (g) Accuracy.
Figure 4—figure supplement 4. Distribution of Ψ^ of example exons.

Figure 4—figure supplement 4.

(a) Skipped Cadm1 exon 8 (chr9: 47829377–47829409). We show the relationship between pseudotime and observed Ψ^; as well as violinplots of the distribution of Ψ^ in each cluster before filtering, and after filtering. (b) Violinplots of the distribution of Ψ^ in each cluster after filtering for the exons in Figure 4k: Nsfl1c exon 4 (chr2: 151502455–151502460), Rpn2 exon 16 (chr2: 157323223–157323270), Tecr exon 4 (chr8: 83573411–83573455), Zfp207 exon 8 (chr11: 80393084–80393176).