. 2001 May;11(5):863–874. doi: 10.1101/gr.176601

Table 1.

Summary of Prediction Results for SIFT and BLOSUM62

Test set	Method	Tolerant prediction accuracy	Deleterious prediction accuracy	Total prediction accuracy	Experimental prediction accuracy

LacI^* n = 4004	`SIFT`	78% (1747/2254)	57% (989/1750)	68% (2736/4004)	66% (989/1496)
	BLOSUM62	31% (696/2254)	84% (1475/1750)	54% (2171/4004)	49% (1475/3033)
HIV-1 Protease n = 336	Automated `SIFT`	70% (78/111)	82% (184/225)	78% (262/336)	85% (184/217)
	`SIFT` without RSV, avian sequences	68% (75/111)	88% (197/225)	81% (272/336)	85% (197/233)
	BLOSUM62	63% (70/111)	73% (165/225)	70% (235/336)	80% (165/206)
Bacteriophage T4	`SIFT`	59% (817/1377)	72% (460/638)	63% (1277/2015)	45% (460/1020)
Lysozyme n = 2015	BLOSUM62	30% (406/1377)	85% (542/638)	47% (948/2015)	36% (542/1513)

The effect of 4004 substitutions was assayed for LacI (Markiewicz et al. 1994; Pace et al. 1997), 336 substitutions for HIV-1 protease (Loeb et al. 1989), and 2015 substitutions for bacteriophage T4 lysozyme (Rennell et al. 1991). These three data sets are used to test prediction performance. Tolerant prediction accuracy is the number of substitutions correctly predicted to have no effect divided by the total number of substitutions that gave a wild-type phenotype under experimental test conditions. Subtracting the numerator from the denominator gives the number of substitutions that have been predicted to be deleterious but gave a wild-type phenotype under experimental conditions. Deleterious prediction accuracy is the number of substitutions correctly predicted to have an effect on the protein divided by the number of substitutions that affected protein. Subtracting the numerator from the denominator gives the number of substitutions that were predicted to have wild-type phenotype but gave a deleterious phenotype under experimental conditions. Total prediction accuracy is the total number of substitutions correctly predicted divided by the total number of substitutions. Experimental prediction accuracy is the number of substitutions that were experimentally shown to affect protein function divided by the number of substitutions predicted to affect function. For the biologist investigating substitutions predicted to have a deleterious effect, the experimental prediction accuracy reflects the proportion of predictions that will yield affected phenotypes experimentally.

SIFT offers prediction for positions 5–329 of the LacI repressor because fewer than half of the sequences are represented at positions 1–4 and 330–360.