Skip to main content
. 2005 Mar 22;33(5):1710–1721. doi: 10.1093/nar/gki311

Table 1.

This table summarizes the different sequence-based features that have been used for identifying amino acid substitutions that could be deleterious to the protein and the results obtained from these studies

Sequence-based features Comment Reference
Properties based on amino acid scale
    Mass, volume, surface area, side-chain properties (charge, polarity), partial specific volume, hydrophobicity, alpha helix propensity, relative occurrence, percent buried, pKa. The physicochemical properties were used as features in a Bayesian framework to predict the pathogenecity of an amino acid variation. Change in hydrophobicity coupled with low positional entropy was shown to be a good predictor. (29)
Position-specific phylogenetic features
    Positional entropy, modified Shannon entropy and normalized site entropy Substitutions at evolutionarily conserved sites have been shown to be strongly correlated with disease-causing mutations. Conservation at a position in a protein sequence has been assessed using slightly modified versions of sequence entropy from MSAs. (29,30,3336)
    Change in residue frequency Residue frequency at a given amino acid position was calculated for both variants from multiple-sequence alignments. Change in residue frequency in conjunction with hydrophobicity correlated with the observed phenotype. (29)
    Conservation related to allele frequency Absolutely conserved residues between at least three mammalian orthologs were identified and variations at these positions were shown to be underrepresented at high allele frequencies compared to variations at unconserved sites. (31)
    Degree of conservation using tree method The number of substitutions at a given position in a sequence was estimated based on known phylogenetic relationships between species. Disease-associated mutations were more prevalent at conserved sites. (32)
    SIFT Calculates a conservation index based on MSA. Normalized probabilities for all possible substitutions at a given amino acid position are obtained from the MSA and substitutions with probabilities below a certain cutoff are deemed intolerant to the protein. (13,14)
Substitution matrices
    BLOSUM, PAM and GRANTHAM It was shown that ∼40% of disease-causing changes had highly unfavorable BLOSUM62 scores. Similar general trends were seen for PAM matrix scores (30). (13,3032,36)
A clear correlation between BLOSUM62 and allele frequency of nonsynonymous SNPs was not seen in a study of SNPs in membrane-transporter genes (31).
BLOSUM62 scores were able to distinguish tolerant from intolerant substitutions in a variety of proteins with total prediction accuracies ranging from 47 to 70% (13).
About 40% balanced classification error was reported by Saunders et al. (36) using BLOSUM62 scores as a predictive feature.
Miller et al. (32) showed that disease-causing amino acid changes are more radical than variation found among species using GRANTHAM scores.