. 2018 Dec 4;9:1437. doi: 10.3389/fphar.2018.01437

Table 1.

Methods to predict the functional effect of missense variants based on sequence information.

Algorithm	Model	Basis of decision	Model training or evaluation	References
SIFT	Direct	Prediction of functionality based on sequence conservation metrics that make use of Dirichlet priors	Variants from protein specific studies (LacI, HIV-1 Protease and Bacteriophage T4 Lysozyme)	Ng and Henikoff, 2001
PANTHER	HMM	Sequence conservation analysis using HMM	Variants from HGMD and dbSNP as deleterious and functionally neutral variants, respectively	Thomas et al., 2003
MAPP	Direct	Quantification of the physicochemical characteristics at each position of the amino acid sequence based on observed evolutionary variation	Protein specific studies (LacI, HIV-1 Protease, HIV reverse transcriptase and Bacteriophage T4 Lysozyme)	Stone and Sidow, 2005
PhastCons	HMM	Identification of conserved elements using a two-state phylogenetic HMM	Calibration on genomes from four model species (human, D. melanogaster, C. elegans, and S. cerevisiae)	Siepel et al., 2005
SNPs3D	SVM	Variant effect prediction based on amino acid sequence conservation metrics and folded state stability of protein structure	Variants from HGMD and dbSNP as deleterious and functionally neutral variants, respectively	Yue et al., 2006
PhD-SNP	SVM	Prediction of variant pathogenicity based on sequence profiles	Variants from HumVar and HumVarProf datasets	Capriotti et al., 2006
SiPhy	HMM	Sequence conservation analysis using HMM	ENCODE Phase I regions	Garber et al., 2009
LRT	Direct	Evolutionary conservation model across 32 vertebrates	Variants in three sequenced human genomes	Chun and Fay, 2009
SNPs&GO	SVM	Variant effect prediction based on sequence information, evolutionary conservation and defined gene ontology score	Variants from SwissProt	Calabrese et al., 2009
B-SIFT	Direct	Sequence conservation metrics that calculate the difference between wild-type and mutant allele	Variants from SwissProt database and protein specific study (Dnase I)	Lee et al., 2009
PolyPhen-2	NB	Considering sequence conservation, Structure parameters such as hydrophobic propensity and B factor	Variants fromn HumDiv and HumVar from UniProt Database	Adzhubei et al., 2010
MutationTaster	NB	Prediction of mutation pathogenicity based on evolutionary conservation, splice-site changes, loss of protein features and changes that affect expression levels	Variants from OMIM database, HGMD and the literature as pathogenic set and neutral variants from dbSNP as controls	Schwarz et al., 2014
MutationAssessor	Direct	Evolutionary conservation patterns within protein families and across species using combinatorial entropy	Variants from UniProt database (HumSaVar)	Reva et al., 2011
Condel	Direct	Integration of five algorithms (SIFT, PolyPhen-2 MAPP, MutationAssessor, and Log R Pfam E-value) into single output score	Variants from HumVar, HumDiv, Cosmic database, IARC TP53 database	González-Pérez and López-Bigas, 2011
PROVEAN	Direct	Alignment-based score that can also assess in-frame insertions, deletions, and multiple amino acid substitutions	Missense variants and indels, replacements from UniProt database	Choi et al., 2012
FATHMM	HMM	Identification of pathogenic variants based on sequence conservation, protein domain-based information and species-specific pathogenicity weights. Also suitable for prediction of non-coding variations.	Variants from the HGMD and Uniprot databases	Shihab et al., 2013, 2015; Rogers et al., 2018
VEST	RF	Prioritization of variants underlying Mendelian diseases	Rare variants from HGMD database as pathogenic set and variants from ESP	Carter et al., 2013
Evolutionary Action	Direct	Prediction of variant effects on evolutionary fitness using a formal genotype-phenotype perturbation equation	Variants from 1000 Genomes Project	Katsonis and Lichtarge, 2014
MetaSVM	SVM	Ensemble score integrating nine functionality predictors (SIFT, PolyPhen-2, GERP++, MutationTaster, MutationAssessor, FATHMM, LRT, SiPhy and PhyloP)	Variants causing Mendelian diseases as pathogenic set and variants that are not associated with any phenotypes as controls, all from Uniprot database	Dong et al., 2015
MetaLR	RM	Same as MetaSVM but using logistic regression instead of SVM.		Dong et al., 2015
SuSPect	SVM	Sequence conservation metrics, structure features and additional network information	Variants from Humsavar database	Yates et al., 2014
PredictSNP	EL	Ensemble score integrating six functionality predictors (MAPP, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP)	Variants mainly from SwissProt, HGMD, dbSNP and Humsavar database	Bendl et al., 2014
SNAP2	NN	Prediction of amino acid variations based on amino acid properties, predicted binding residues, predicted disordered and low-complexity regions, proximity to N- and C-terminus, statistical contact potentials, co-evolving positions, secondary structure and solvent accessibility	Variants from PMD, Swiss-Prot, OMIM, HumVar and protein specific data sets (LacI)	Hecht et al., 2015
REVEL	RF	Ensemble method tailored specifically for the prediction of rare genetic variant effects integrating MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons	Variants from HGMD as pathogenic set and neutral variants from ESP as controls	Ioannidis et al., 2016
ConSurf	Empirical Bayesian method and maximum likelihood estimation	Mapping of evolutionarily conserved residues on protein surfaces by estimating the evolutionary rates of each nucleic acid and amino acid sequence position using multiple sequence alignments. Also offers RNA secondary structure predictions.	Protein with at least five known 3D structure homologs and precise annotation of their functional sites (with different nature)	Ashkenazy et al., 2016
VIPUR	RM	Combination of sequence- and structure-based features to identify and functionally interpret deleterious variants	Variants from HumDiv and UniProt with clear evidence of protein disruption	Baugh et al., 2016
Envision	GTB	Decision tree ensemble-based tool using a stochastic gradient boosting learning algorithm	Variants from nine large-scale experimental mutagenesis datasets in eight proteins	Gray et al., 2018
EVmutation	Direct	Unsupervised method exploiting sequence conservation by incorporating interaction information between all pairs of residues in protein	34 data sets from 21 proteins and a tRNA gene extracted from 27 publications	Hopf et al., 2017
PredSAV	GTB	Identification of pathogenic variants based on sequence, structure, residue-contact networks as well as structural neighborhood features	Human variants from Uniprot and OMIM as pathogenic set and Ensemble variants as neutral controls	Pan et al., 2017
SNPMuSiC	NN	Structure stability based, implement PoPMuSiC and HoTMuSiC on the basis of 13 statistical potentials (distence potentials, solvent accessibility potentials and torsion potentials) and 2 biophysical characteristics (solvent accessibility of mutated residue and difference in volume)	Variants from dbSNP, SwissVar and HumSaVar datasets	Ancien et al., 2018
DEOGEN2	RF	Integration of 11 scores and metrices into one meta-score, considering evolutionary features, folding predictions, domain information as well as gene features to identify deleterious variants	Training and test on variants from the UniProt Humsavar16 dataset	Raimondi et al., 2017
ADME prediction framework	Direct	Integration of prediction scores from five orthogonal algorithms (LRT, MutationAssessor, PROVEAN, VEST3 and CADD) using parameters optimized for pharmacogenes	Training and validation specifically on experimentally characterized pharmacogenetic data sets from 43 ADME genes	Zhou et al., 2018

HMM, hidden Markov model; SVM, support vector machine; NB, naïve Bayes classifier; EL, ensemble learning; RF, random forest; RM, regression model; NN, neural networks; GTB, gradient tree boosting; HGMD, Human Gene Mutation Database; OMIM, Online Mendelian Inheritance in Man; ESP, Exome Sequencing Project; PMD, Protein Mutant Database.