Table 3.
Algorithm | Application | Basis of decision | Model training or evaluation | References |
---|---|---|---|---|
NMD Classifier | NMD | Prediction of NMD for a given transcript based on comparison to most similar coding transcript | Simulation-based evaluation based on screening artificial transcript structure-altering events | Hsu et al., 2017 |
NNSplice | Splicing (splice sites) | Sequence splice site analysis using HMM | Distinguish splice site sequences from sequences in the neighborhood of real splice sites | Reese et al., 1997 |
MaxEntScan | Splicing (splice sites) | Splice site analysis by modeling short sequence motifs using the maximum entropy principle with constraints estimated from available data. | 1,821 transcripts unambiguously aligned across the entire coding region, spanning a total of 12,715 introns | Yeo and Burge, 2004 |
GeneSplicer | Splicing (splice sites) | Splice site prediction using maximal dependence decomposition with the addition of markov model to capture dependencies among neighboring bases | Annotated genes from the Exon-Intron Database | Pertea et al., 2001 |
SplicePort | Splicing (splice sites) | Splice site prediction using C-modified least squares learning based on positional and compositional sequence features | Training on 4,000 pre-mRNA human RefSeq sequences and test on B2Hum data set | Dogan et al., 2007 |
Skippy | Splicing (regulatory sequences) | Prediction of variants causing exon skipping, exon inclusion or ectopic splice site activation based on sequence information, proximity to splice junctions and evolutionary constraint of the peri-variant region | Multiple exonic splicing regulatory elements datasets as positive data and HapMap variants as splicing-neutral variants | Woolfe et al., 2010 |
MutPred Splice | Splicing (regulatory sequences) | Prediction of auxiliary splice sequences using multiple variant-, flanking exon- and gene-based features | Splicing variants from HGMD as pathogenic set and non-splicing variants from both HGMD and 1000G as neutral controls | Mort et al., 2014 |
scSNVEL | Splicing (splice sites) | Ensemble prediction using 8 algorithms using random forest learning | Splice variants from HGMD, SpliceDisease database and DBASS as pathogenic set and variants not implicated in splicing from both HGMD and 1000G as controls | Jian et al., 2014b |
SPANR | Splicing (splice sites and splice regulatory sequences) | Integrating 1,393 sequence features from each exon and its neighboring introns and exons to identify splice sites as well as intronic and exonic splice regulators | 10,689 exons that displayed evidence of alternative splicing | Xiong et al., 2015 |
CryptSplice | Splicing (splice sites) | Prediction of cryptic splice-site activation using an SVM model | Sequences from the annotated NN269 and HS3D splice datasets with positive sequence in splice sites and control sequence outside splice sites | Lee et al., 2017 |
Corvelo et al. | Splicing (branch points) | Analysis of splice site sequence conservation and position bias using SVM | A set of 8,156 conserved putative branch point sequences from 7 mammalian species | Corvelo et al., 2010 |
BPP | Splicing (branch points) | Identification of branch point motifs by integrating information on the branch point sequence and the polypyrimidine tract | Intron sequences longer than 300 nucleotides | Zhang et al., 2017 |
TurboFold | Splicing (pre-mRNA structure) | Probabilistic method that integrates comparative sequence analyses with thermodynamic folding models | Thorough benchmarking against three methods that estimate base pairing probabilities and eight tools for structural predictions based on known RNA structures | Harmanci et al., 2011 |
CentroidFold | Splicing (pre-mRNA structure) | RNA secondary structure prediction using the γ-centroid estimator | Validation based on 151 RNA experimentally determined RNA structures | Sato et al., 2009 |
mrSNP | miRNA binding | miRNA binding energy calculations for reference and variant containing sequence and report of binding difference | Evaluation based on variants that map to miRNA targets predicted by TargetScan | Deveci et al., 2014 |
PinPor | RBP binding | Bayesian network approach that incorporates information about sequence features, stabilization of RNA secondary structure and evolutionary conservation | Inframe indels from HGMD as pathogenic and common indels from 1000G as neutral controls | Zhang et al., 2014 |
HGMD, Human Gene Mutation Database; 1000G = 1000 Genomes Project; DBASS, Database for Aberrant Splice Sites; NMD, nonsense-mediated decay; HMM, hidden Markov model; RBP, RNA binding protein.