Table 2. Amino acid substitution (AAS) prediction methods used in this study.
Program | Input | Algorithm | Output | URL | Reference |
---|---|---|---|---|---|
SIFT | PS and AAS, protein sequence alignment and AAS, dbSNP id, or protein id | Uses sequence homology, scores assessment is based on position-specific scoring matrices with Dirichlet priors | Score ranges from 0 to 1, where < = 0.05 is damaging and >0.05 is tolerated | http://sift.jcvi.org/www/SIFT_enst_submit.html | Ng and Henikoff, 2001 [63] |
PolyPhen-2 | PS and AAS, dbSNP id, HGVbASE id, or protein id | Uses sequence conservation and structure to model location of amino acid substitution, Swiss-Prot and TrEMBL annotation | Score ranges from 0 to 1, where < = 0.05 is benign, and >0.05 is damaging | http://genetics.bwh.harvard.edu/pph2/ | Ramensky et al. 2002 [50] |
PANTHER-PSEP | PS and AAS | Uses sequence homology; scores are based on PANTHER Hidden Markov Model families | Probably damaging: time > 450my possibly damaging: 450my > time > 200my probably benign: time < 200my) | http://www.pantherdb.org/tools/csnpScoreForm.jsp | Tang and Thomas, 2016 [64] |
MutPred | Protein id, PS, or multiple sequence alignment | Prediction is based on one of two neural networks which uses internal databases, secondary structure prediction, and sequence conservation | Score ranges from 0 to 1, where 0 is polymorphism and high scores are predicted to be deleterious/disease-associated | http://mutpred.mutdb.org/ | Li et al. 2009 [65] |
MutatioTaster | DNA sequence | Predictions are calculated by a naive Bayes classifier, which predicts the disease potential | Prediction is based one of four possible types: a) disease causing: probably deleterious b) disease causing automatic: known to be deleterious c) polymorphism: probably harmless d) polymorphism automatic: known to be harmless | http://www.mutationtaster.org/ | Schwarz et al. 2014 [53] |
Provean | PS and AAS | Uses an alignment-based score approach to generate predictions not only for single amino acid substitutions, but also for multiple amino acid substitutions, and in-frame insertions and deletions | the default score threshold is currently set at -2.5, in which >-2.5 is neutral, and <-2.5 is deleterious | http://provean.jcvi.org/index.php | Choi and Chan, 2015 [54] |
PMUT | PS and AAS, dbSNP, Uniprot or PDB ID of protein | Based on the application of neural networks which uses internal databases, secondary structure prediction, and sequence conservation | Score ranges from 0 to 1, where <0.50 is neutral and >0.50 is disease associated | http://mmb.pcb.ub.es/pmut2017/analyses/new/ | Ferrer-Costa et al. 2002 [55] |
FATHMM | protein identifier and the amino acid substitution, dbSNP id | Uses sequence homology | The score threshold is set at -2.5, in which >-2.5 is neutral, and <-2.5 is deleterious | http://fathmm.biocompute.org.uk/index.html | Shihab et al. 2013 [56] |
nsSNPAnalyzer | Protein sequence in FASTA format and a substitution file denoting the SNP identities to be analyzed | Uses information contained in the multiple sequence alignment and information contained in the three-dimensional protein structure to make predictions. | Normalized probability of the substitution calculated by the SIFT program | http://snpanalyzer.uthsc.edu/ | Bao et al. 2005 [57] |
Align GV-GD | Protein sequence in FASTA format and a substitution file denoting the SNP identities to be analyzed | Uses biophysical features of amino acids and protein multiple sequence alignments | A value of C > 0 was considered deleterious; otherwise a variant was neutral | http://agvgd.hci.utah.edu/ | Tavtigian et al. 2006 [58] |
REVEL | Precomputed REVEL scores are provided for all possible human missense variants | Prediction is based on a combination of scores from 13 individual tools | Score ranges from 0 to 1, where <0.50 is neutral and >0.50 is pathogenic | https://sites.google.com/site/revelgenomics/ | Ioannidis et al. 2016 [59] |
AAS; amino acid sequences, PS; protein sequence, PDB, protein data bank