Table 1.
Tool | Approach | Algorithm | Score range | Characteristic | Training data | Input data | Nucleotide positions | Interface | Year |
---|---|---|---|---|---|---|---|---|---|
CADD | Support vector machine with linear kernel | ML | – | Integrates more than 60 genomic features into a single score | 13,141,299 SNVs, 627,071 insertions and 926,968 deletions from simulated and observed variants | VCF file | – | Website | 2014 |
DSSP | CNN with long short‐term memory | DL | 0–1 | Individual prediction for SDS and SAS | HS3D | 140 nt sequence with consensus sequence the middle | 140 nt | Python script | 2018 |
GeneSplicer | Decision tree and Markov model | ML | 0–15 | Markov model captures additional dependencies among neighboring bases at splice sites | 1323 plant genes and 1115 human genes | FASTA sequence | Up to 80 nt on both sites of splice site | Alamut | 2001 |
MaxEntScan | Maximum entropy |
Other |
0–12 | Use of different constraints sorted by the effect on entropy, only second‐order dependencies | 1821 nonredundant transcripts with 12,715 introns | 9‐mer FASTA sequence | 9 nt at SAS, 23 nt at SDS | Alamut | 2004 |
MMSplice | Individual modules scoring exon, intron, and splice sites | DL |
– |
Predicts quantitative physical measures of splicing | Vex‐seq + GENCODE | VCF file | All nucleotides in intron, exon, intron structure | Python package | 2018 |
NNSPLICE | Hidden Markov model and neural network | ML | 0–1 | Captures pairwise correlations between adjacent nucleotides | 285 multiple‐exon human DNA sequences from GenBank | FASTA sequence | −7 to +8 at SAS, −21 to +20 at SDS | Alamut | 1997 |
SPIDEX | Bayesian modeling | ML | 0–1 | Tissue‐specific PSI values | Illumina Human Body Map 2.0 project | VCF file | Depending on features, up to 2000 nt in introns and 300 nt in exons | Txt file with precomputed values | 2015 |
SpliceAI | Deep learning with ResNet blocks | DL | 0–1 | Predicts nucleosome positioning from sequence | GENCODE | VCF file | 10,000 nt | Python package | 2019 |
SpliceRover | CNN | DL | 0–1 | Identifies regions/structures of interest by normalizing contribution scores, and individual models for SDS and SAS | Human and plant | FASTA sequence | Minimal 400 nt | Website | 2018 |
SpliceSiteFinder‐like | Position weight matrices | Other | 0–100 | – | – | – | – | Alamut | 1987 |
Abbreviations: CNN, convolutional neural network; DL, deep learning; ML, machine learning; nt, nucleotides; SAS, splice acceptor site; SDS, splice donor site; VCF, variant call format.