Figure 4.
Validation of predicted skipped exons in humans using ExonSkipDB. (A) Histogram showing probability that an exon is skippable based on protein sequence derived from hg38 coding regions. Constitutive exons have a skip probability ≤0.5 and skippable exons have a skip probability >0.5. (B) Pie chart showing exons that are annotated as constitutive (navy blue) or skippable (light blue) in hg38. Of the 622,195 analyzed exons, 12.9% are annotated as skippable and 87.1% are annotated as constitutive. (C) Pie chart showing exons that are predicted as constitutive (navy blue) or skippable (light blue). Of the 622,195 analyzed exons, 24.6% are predicted as skippable and 75.4% are predicted as constitutive. Pie chart showing the reading frame for exons from the predicted constitutive cohort. 76.4% are out-of-frame exons (navy blue), and 23.6% are in-frame exons (light blue). Pie chart showing the reading frame for exons predicted to be skippable. >99% are in-frame exons (light blue), and <1% are out-of-frame exons (navy blue). (D) Scheme of data used to validate predicted skipped exons. ExonSkipDB (43) used GTEx and TCGA to identify skipped exons. Additional validated skipped exons were found from VastDB (44). (E) Upset plots showing the overlap of skipped exons as predicted by Exon ByPASS, annotated in hg38, present in GTEx, present in TCGA or present in VastDB. The overlap of predicted but not annotated exons identified by GTEx and TCGA or VastDB is shown (light blue). (F) Violin plot showing skip probability with respect to TCGA (light blue) and GTEx (navy blue). The dashed line demarcates a skip probability of 0.5. (G) Receiver operator curve (ROC) showing Exon ByPass's classification of all hg38 exons. Exon ByPASS predictions compared with skipped exons in TCGA and hg38 annotation (light blue) and compared with skipped exons in GTEx and hg38 annotation (navy blue) are shown.