Skip to main content
. 2021 May 20;42(7):799–810. doi: 10.1002/humu.24212

Table 1.

Overview of the most important properties of the different splice prediction tools

Tool Approach Algorithm Score range Characteristic Training data Input data Nucleotide positions Interface Year
CADD Support vector machine with linear kernel ML Integrates more than 60 genomic features into a single score 13,141,299 SNVs, 627,071 insertions and 926,968 deletions from simulated and observed variants VCF file Website 2014
DSSP CNN with long short‐term memory DL 0–1 Individual prediction for SDS and SAS HS3D 140 nt sequence with consensus sequence the middle 140 nt Python script 2018
GeneSplicer Decision tree and Markov model ML 0–15 Markov model captures additional dependencies among neighboring bases at splice sites 1323 plant genes and 1115 human genes FASTA sequence Up to 80 nt on both sites of splice site Alamut 2001
MaxEntScan Maximum entropy

Other

0–12 Use of different constraints sorted by the effect on entropy, only second‐order dependencies 1821 nonredundant transcripts with 12,715 introns 9‐mer FASTA sequence 9 nt at SAS, 23 nt at SDS Alamut 2004
MMSplice Individual modules scoring exon, intron, and splice sites DL

Predicts quantitative physical measures of splicing Vex‐seq + GENCODE VCF file All nucleotides in intron, exon, intron structure Python package 2018
NNSPLICE Hidden Markov model and neural network ML 0–1 Captures pairwise correlations between adjacent nucleotides 285 multiple‐exon human DNA sequences from GenBank FASTA sequence −7 to +8 at SAS, −21 to +20 at SDS Alamut 1997
SPIDEX Bayesian modeling ML 0–1 Tissue‐specific PSI values Illumina Human Body Map 2.0 project VCF file Depending on features, up to 2000 nt in introns and 300 nt in exons Txt file with precomputed values 2015
SpliceAI Deep learning with ResNet blocks DL 0–1 Predicts nucleosome positioning from sequence GENCODE VCF file 10,000 nt Python package 2019
SpliceRover CNN DL 0–1 Identifies regions/structures of interest by normalizing contribution scores, and individual models for SDS and SAS Human and plant FASTA sequence Minimal 400 nt Website 2018
SpliceSiteFinder‐like Position weight matrices Other 0–100 Alamut 1987

Abbreviations: CNN, convolutional neural network; DL, deep learning; ML, machine learning; nt, nucleotides; SAS, splice acceptor site; SDS, splice donor site; VCF, variant call format.