Table 1.
Algorithm | Model | Basis of decision | Model training or evaluation | References |
---|---|---|---|---|
SIFT | Direct | Prediction of functionality based on sequence conservation metrics that make use of Dirichlet priors | Variants from protein specific studies (LacI, HIV-1 Protease and Bacteriophage T4 Lysozyme) | Ng and Henikoff, 2001 |
PANTHER | HMM | Sequence conservation analysis using HMM | Variants from HGMD and dbSNP as deleterious and functionally neutral variants, respectively | Thomas et al., 2003 |
MAPP | Direct | Quantification of the physicochemical characteristics at each position of the amino acid sequence based on observed evolutionary variation | Protein specific studies (LacI, HIV-1 Protease, HIV reverse transcriptase and Bacteriophage T4 Lysozyme) | Stone and Sidow, 2005 |
PhastCons | HMM | Identification of conserved elements using a two-state phylogenetic HMM | Calibration on genomes from four model species (human, D. melanogaster, C. elegans, and S. cerevisiae) | Siepel et al., 2005 |
SNPs3D | SVM | Variant effect prediction based on amino acid sequence conservation metrics and folded state stability of protein structure | Variants from HGMD and dbSNP as deleterious and functionally neutral variants, respectively | Yue et al., 2006 |
PhD-SNP | SVM | Prediction of variant pathogenicity based on sequence profiles | Variants from HumVar and HumVarProf datasets | Capriotti et al., 2006 |
SiPhy | HMM | Sequence conservation analysis using HMM | ENCODE Phase I regions | Garber et al., 2009 |
LRT | Direct | Evolutionary conservation model across 32 vertebrates | Variants in three sequenced human genomes | Chun and Fay, 2009 |
SNPs&GO | SVM | Variant effect prediction based on sequence information, evolutionary conservation and defined gene ontology score | Variants from SwissProt | Calabrese et al., 2009 |
B-SIFT | Direct | Sequence conservation metrics that calculate the difference between wild-type and mutant allele | Variants from SwissProt database and protein specific study (Dnase I) | Lee et al., 2009 |
PolyPhen-2 | NB | Considering sequence conservation, Structure parameters such as hydrophobic propensity and B factor | Variants fromn HumDiv and HumVar from UniProt Database | Adzhubei et al., 2010 |
MutationTaster | NB | Prediction of mutation pathogenicity based on evolutionary conservation, splice-site changes, loss of protein features and changes that affect expression levels | Variants from OMIM database, HGMD and the literature as pathogenic set and neutral variants from dbSNP as controls | Schwarz et al., 2014 |
MutationAssessor | Direct | Evolutionary conservation patterns within protein families and across species using combinatorial entropy | Variants from UniProt database (HumSaVar) | Reva et al., 2011 |
Condel | Direct | Integration of five algorithms (SIFT, PolyPhen-2 MAPP, MutationAssessor, and Log R Pfam E-value) into single output score | Variants from HumVar, HumDiv, Cosmic database, IARC TP53 database | González-Pérez and López-Bigas, 2011 |
PROVEAN | Direct | Alignment-based score that can also assess in-frame insertions, deletions, and multiple amino acid substitutions | Missense variants and indels, replacements from UniProt database | Choi et al., 2012 |
FATHMM | HMM | Identification of pathogenic variants based on sequence conservation, protein domain-based information and species-specific pathogenicity weights. Also suitable for prediction of non-coding variations. | Variants from the HGMD and Uniprot databases | Shihab et al., 2013, 2015; Rogers et al., 2018 |
VEST | RF | Prioritization of variants underlying Mendelian diseases | Rare variants from HGMD database as pathogenic set and variants from ESP | Carter et al., 2013 |
Evolutionary Action | Direct | Prediction of variant effects on evolutionary fitness using a formal genotype-phenotype perturbation equation | Variants from 1000 Genomes Project | Katsonis and Lichtarge, 2014 |
MetaSVM | SVM | Ensemble score integrating nine functionality predictors (SIFT, PolyPhen-2, GERP++, MutationTaster, MutationAssessor, FATHMM, LRT, SiPhy and PhyloP) | Variants causing Mendelian diseases as pathogenic set and variants that are not associated with any phenotypes as controls, all from Uniprot database | Dong et al., 2015 |
MetaLR | RM | Same as MetaSVM but using logistic regression instead of SVM. | Dong et al., 2015 | |
SuSPect | SVM | Sequence conservation metrics, structure features and additional network information | Variants from Humsavar database | Yates et al., 2014 |
PredictSNP | EL | Ensemble score integrating six functionality predictors (MAPP, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP) | Variants mainly from SwissProt, HGMD, dbSNP and Humsavar database | Bendl et al., 2014 |
SNAP2 | NN | Prediction of amino acid variations based on amino acid properties, predicted binding residues, predicted disordered and low-complexity regions, proximity to N- and C-terminus, statistical contact potentials, co-evolving positions, secondary structure and solvent accessibility | Variants from PMD, Swiss-Prot, OMIM, HumVar and protein specific data sets (LacI) | Hecht et al., 2015 |
REVEL | RF | Ensemble method tailored specifically for the prediction of rare genetic variant effects integrating MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons | Variants from HGMD as pathogenic set and neutral variants from ESP as controls | Ioannidis et al., 2016 |
ConSurf | Empirical Bayesian method and maximum likelihood estimation | Mapping of evolutionarily conserved residues on protein surfaces by estimating the evolutionary rates of each nucleic acid and amino acid sequence position using multiple sequence alignments. Also offers RNA secondary structure predictions. | Protein with at least five known 3D structure homologs and precise annotation of their functional sites (with different nature) | Ashkenazy et al., 2016 |
VIPUR | RM | Combination of sequence- and structure-based features to identify and functionally interpret deleterious variants | Variants from HumDiv and UniProt with clear evidence of protein disruption | Baugh et al., 2016 |
Envision | GTB | Decision tree ensemble-based tool using a stochastic gradient boosting learning algorithm | Variants from nine large-scale experimental mutagenesis datasets in eight proteins | Gray et al., 2018 |
EVmutation | Direct | Unsupervised method exploiting sequence conservation by incorporating interaction information between all pairs of residues in protein | 34 data sets from 21 proteins and a tRNA gene extracted from 27 publications | Hopf et al., 2017 |
PredSAV | GTB | Identification of pathogenic variants based on sequence, structure, residue-contact networks as well as structural neighborhood features | Human variants from Uniprot and OMIM as pathogenic set and Ensemble variants as neutral controls | Pan et al., 2017 |
SNPMuSiC | NN | Structure stability based, implement PoPMuSiC and HoTMuSiC on the basis of 13 statistical potentials (distence potentials, solvent accessibility potentials and torsion potentials) and 2 biophysical characteristics (solvent accessibility of mutated residue and difference in volume) | Variants from dbSNP, SwissVar and HumSaVar datasets | Ancien et al., 2018 |
DEOGEN2 | RF | Integration of 11 scores and metrices into one meta-score, considering evolutionary features, folding predictions, domain information as well as gene features to identify deleterious variants | Training and test on variants from the UniProt Humsavar16 dataset | Raimondi et al., 2017 |
ADME prediction framework | Direct | Integration of prediction scores from five orthogonal algorithms (LRT, MutationAssessor, PROVEAN, VEST3 and CADD) using parameters optimized for pharmacogenes | Training and validation specifically on experimentally characterized pharmacogenetic data sets from 43 ADME genes | Zhou et al., 2018 |
HMM, hidden Markov model; SVM, support vector machine; NB, naïve Bayes classifier; EL, ensemble learning; RF, random forest; RM, regression model; NN, neural networks; GTB, gradient tree boosting; HGMD, Human Gene Mutation Database; OMIM, Online Mendelian Inheritance in Man; ESP, Exome Sequencing Project; PMD, Protein Mutant Database.