Skip to main content
. 2018 Dec 4;9:1437. doi: 10.3389/fphar.2018.01437

Table 1.

Methods to predict the functional effect of missense variants based on sequence information.

Algorithm Model Basis of decision Model training or evaluation References
SIFT Direct Prediction of functionality based on sequence conservation metrics that make use of Dirichlet priors Variants from protein specific studies (LacI, HIV-1 Protease and Bacteriophage T4 Lysozyme) Ng and Henikoff, 2001
PANTHER HMM Sequence conservation analysis using HMM Variants from HGMD and dbSNP as deleterious and functionally neutral variants, respectively Thomas et al., 2003
MAPP Direct Quantification of the physicochemical characteristics at each position of the amino acid sequence based on observed evolutionary variation Protein specific studies (LacI, HIV-1 Protease, HIV reverse transcriptase and Bacteriophage T4 Lysozyme) Stone and Sidow, 2005
PhastCons HMM Identification of conserved elements using a two-state phylogenetic HMM Calibration on genomes from four model species (human, D. melanogaster, C. elegans, and S. cerevisiae) Siepel et al., 2005
SNPs3D SVM Variant effect prediction based on amino acid sequence conservation metrics and folded state stability of protein structure Variants from HGMD and dbSNP as deleterious and functionally neutral variants, respectively Yue et al., 2006
PhD-SNP SVM Prediction of variant pathogenicity based on sequence profiles Variants from HumVar and HumVarProf datasets Capriotti et al., 2006
SiPhy HMM Sequence conservation analysis using HMM ENCODE Phase I regions Garber et al., 2009
LRT Direct Evolutionary conservation model across 32 vertebrates Variants in three sequenced human genomes Chun and Fay, 2009
SNPs&GO SVM Variant effect prediction based on sequence information, evolutionary conservation and defined gene ontology score Variants from SwissProt Calabrese et al., 2009
B-SIFT Direct Sequence conservation metrics that calculate the difference between wild-type and mutant allele Variants from SwissProt database and protein specific study (Dnase I) Lee et al., 2009
PolyPhen-2 NB Considering sequence conservation, Structure parameters such as hydrophobic propensity and B factor Variants fromn HumDiv and HumVar from UniProt Database Adzhubei et al., 2010
MutationTaster NB Prediction of mutation pathogenicity based on evolutionary conservation, splice-site changes, loss of protein features and changes that affect expression levels Variants from OMIM database, HGMD and the literature as pathogenic set and neutral variants from dbSNP as controls Schwarz et al., 2014
MutationAssessor Direct Evolutionary conservation patterns within protein families and across species using combinatorial entropy Variants from UniProt database (HumSaVar) Reva et al., 2011
Condel Direct Integration of five algorithms (SIFT, PolyPhen-2 MAPP, MutationAssessor, and Log R Pfam E-value) into single output score Variants from HumVar, HumDiv, Cosmic database, IARC TP53 database González-Pérez and López-Bigas, 2011
PROVEAN Direct Alignment-based score that can also assess in-frame insertions, deletions, and multiple amino acid substitutions Missense variants and indels, replacements from UniProt database Choi et al., 2012
FATHMM HMM Identification of pathogenic variants based on sequence conservation, protein domain-based information and species-specific pathogenicity weights. Also suitable for prediction of non-coding variations. Variants from the HGMD and Uniprot databases Shihab et al., 2013, 2015; Rogers et al., 2018
VEST RF Prioritization of variants underlying Mendelian diseases Rare variants from HGMD database as pathogenic set and variants from ESP Carter et al., 2013
Evolutionary Action Direct Prediction of variant effects on evolutionary fitness using a formal genotype-phenotype perturbation equation Variants from 1000 Genomes Project Katsonis and Lichtarge, 2014
MetaSVM SVM Ensemble score integrating nine functionality predictors (SIFT, PolyPhen-2, GERP++, MutationTaster, MutationAssessor, FATHMM, LRT, SiPhy and PhyloP) Variants causing Mendelian diseases as pathogenic set and variants that are not associated with any phenotypes as controls, all from Uniprot database Dong et al., 2015
MetaLR RM Same as MetaSVM but using logistic regression instead of SVM. Dong et al., 2015
SuSPect SVM Sequence conservation metrics, structure features and additional network information Variants from Humsavar database Yates et al., 2014
PredictSNP EL Ensemble score integrating six functionality predictors (MAPP, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP) Variants mainly from SwissProt, HGMD, dbSNP and Humsavar database Bendl et al., 2014
SNAP2 NN Prediction of amino acid variations based on amino acid properties, predicted binding residues, predicted disordered and low-complexity regions, proximity to N- and C-terminus, statistical contact potentials, co-evolving positions, secondary structure and solvent accessibility Variants from PMD, Swiss-Prot, OMIM, HumVar and protein specific data sets (LacI) Hecht et al., 2015
REVEL RF Ensemble method tailored specifically for the prediction of rare genetic variant effects integrating MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons Variants from HGMD as pathogenic set and neutral variants from ESP as controls Ioannidis et al., 2016
ConSurf Empirical Bayesian method and maximum likelihood estimation Mapping of evolutionarily conserved residues on protein surfaces by estimating the evolutionary rates of each nucleic acid and amino acid sequence position using multiple sequence alignments. Also offers RNA secondary structure predictions. Protein with at least five known 3D structure homologs and precise annotation of their functional sites (with different nature) Ashkenazy et al., 2016
VIPUR RM Combination of sequence- and structure-based features to identify and functionally interpret deleterious variants Variants from HumDiv and UniProt with clear evidence of protein disruption Baugh et al., 2016
Envision GTB Decision tree ensemble-based tool using a stochastic gradient boosting learning algorithm Variants from nine large-scale experimental mutagenesis datasets in eight proteins Gray et al., 2018
EVmutation Direct Unsupervised method exploiting sequence conservation by incorporating interaction information between all pairs of residues in protein 34 data sets from 21 proteins and a tRNA gene extracted from 27 publications Hopf et al., 2017
PredSAV GTB Identification of pathogenic variants based on sequence, structure, residue-contact networks as well as structural neighborhood features Human variants from Uniprot and OMIM as pathogenic set and Ensemble variants as neutral controls Pan et al., 2017
SNPMuSiC NN Structure stability based, implement PoPMuSiC and HoTMuSiC on the basis of 13 statistical potentials (distence potentials, solvent accessibility potentials and torsion potentials) and 2 biophysical characteristics (solvent accessibility of mutated residue and difference in volume) Variants from dbSNP, SwissVar and HumSaVar datasets Ancien et al., 2018
DEOGEN2 RF Integration of 11 scores and metrices into one meta-score, considering evolutionary features, folding predictions, domain information as well as gene features to identify deleterious variants Training and test on variants from the UniProt Humsavar16 dataset Raimondi et al., 2017
ADME prediction framework Direct Integration of prediction scores from five orthogonal algorithms (LRT, MutationAssessor, PROVEAN, VEST3 and CADD) using parameters optimized for pharmacogenes Training and validation specifically on experimentally characterized pharmacogenetic data sets from 43 ADME genes Zhou et al., 2018

HMM, hidden Markov model; SVM, support vector machine; NB, naïve Bayes classifier; EL, ensemble learning; RF, random forest; RM, regression model; NN, neural networks; GTB, gradient tree boosting; HGMD, Human Gene Mutation Database; OMIM, Online Mendelian Inheritance in Man; ESP, Exome Sequencing Project; PMD, Protein Mutant Database.