Table 2.
Summary of the variant dataset
| Outcome | Donor |
Acceptor |
Total | ||
|---|---|---|---|---|---|
| Non-canonical | Canonical | Non-canonical | Canonical | ||
| Cryptic site creation | 143 | 7 | 191 | 13 | 354 |
| Canonical site disrupted | 1,125 | 3,576 | 360 | 2,882 | 7,943 |
| Other | 7 | 0 | 10 | 0 | 17 |
| Totals | 1,275 | 3,583 | 561 | 2,895 | 8,314 |
We created a collection of splice variants by curating literature. During curation, we recorded metadata regarding the variant pathomechanism and the observed outcome. Based on the outcome, we categorized the variants into two major groups: (1) variants disrupting canonical splice sites and leading to activation of a cryptic splice site, or to exon skipping, and (2) variants that activate cryptic splice. 73,203 neutral variants were used as negative training examples. There were 4,858 donor variants and 3,456 acceptor variants. Of these, 1,836 were non-canonical and 6,478 were canonical (i.e., located at the ±1 or ±2 positions).