Skip to main content
. 2011 Jul 18;6(7):e20451. doi: 10.1371/journal.pone.0020451

Table 1. Summary of tolerated sequence prediction performance on different datasets using the generalized protocol described here.

Residue positions Bits of information Fraction Top 5 (%)
Proteins Phage display Predicted AAD (%) AUC Rank Top
GB1 (kT = 0.23) 1 6 1.58 2.66 56.9 5.61 0.74 6.17
GB1 (kT = 0.59) 1 6 1.58 0.89 54.2 4.05 0.71 7.17
hGH/hGHR1 1 16 1.19 3.58 59.3 7.46 0.75 6.00
hGH/hGHR2 1 35 0.89 3.24 41.9 7.48 0.64 7.72
PDZ/Peptide 5 25 3.11 2.82 81.7 4.16 0.87 2.84
PDZ/Peptide3 5 25 3.11 3.06 82.0 3.67 0.88 2.76
1

16 designed hGH amino acid positions as defined in [23] and shown in Figure 3.

2

All designed hGH amino acid positions shown in Figure S4.

3

Performance metrics based on position weight matrices from Smith & Kortemme 2010 [35].

Scoring metrics are used as defined previously [35]. Fraction Top 5 gives the average fraction (for every position) of amino acids with phage display frequencies ≥10% in the predicted top 5 ranked amino acids. AAD gives the average absolute difference in amino acid frequency between prediction and phage display. AUC gives the area under receiver operator characteristic curve, with true positives defined as those with phage display frequencies ≥10%. Rank top gives the average rank of the most frequently observed amino acid in phage display. The table gives results from one set of predictions as described in Methods. To gauge the variability, we repeated the predictions three times and calculated the standard deviation of the scoring metrics. The absolute standard deviations and dynamic ranges are 0.4/4.32 (Bits Predicted), 1.9/100 (Fraction Top 5), 0.4/10 (AAD), 0.006/1 (AUC), and 0.2/19 (Rank Top). As a percentage of the dynamic range of a given metric, the average standard deviations (over the first 5 rows) were: 0.9% (Bits Predicted), 1.9% (Fraction Top 5), 0.4% (AAD), 0.6% (AUC), and 1.1% (Rank Top).