Table 2.
Effects of experimental and sequence variables on prediction power
| Model | Variables Used in Prediction Model | DSPred Errorf |
Corre- lationg |
ROC Areah | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Experimental Variables | Sequence Variables | DS>3 | DS≥5 | ||||||||||
|
A. Best with expt & seqa |
R30 | YldS | SECR1 | DLSMR | MW | Dismax | 1.96 (0.13) |
0.56 (0.06) |
0.77 (0.04) |
0.87 (0.05) |
|||
|
B. Leave out seq from Ab |
R30 | (YldS) | SECR1 | DLSMR | 2.73 (0.08) |
−0.07 (0.06) |
0.61 (0.05) |
0.49 (0.06) |
|||||
|
C. Leave out expt from Ac |
MW | Dismax | 2.46 (0.10) |
0.18 (0.07) |
0.65 (0.05) |
0.69 (0.06) |
|||||||
|
D. Best with expt onlyd |
R30 | YldS | SECPP d | DLSMW d | LPav d | 1.90 (0.06) |
0.57 (0.04) |
0.70 (0.08) |
0.71 (0.08) |
||||
|
E. Best with seq onlye |
MW | Dismax | Hydav e | XPe | 2.58 (0.12) |
0.17 (0.08) |
0.64 (0.05) |
0.63 (0.06) |
|||||
For descriptions of variables see Table 1.
Best partition model combining experimental and sequence variables from 77-sample training set.
The 4 experimental variables from model A were supplied to the partition algorithm. The algorithm discarded YldS as a criterion.
The 2 sequence variables from A were supplied to the algorithm; the algorithm used both as criteria.
All experimental variables were supplied. The algorithm used 2 of the same variables as in A, replaced SECR1 and DLSMR with related variables SECPP and DLSMW, and added LPav.
All sequence variables were supplied; hydropathy (Hydav) and XtalPred score (XP) were added to the sequence variables used in A.
Three measures of predictive power for the 30-sample test set (parentheses: standard deviation estimated from synthetic data).
Square root of the mean square difference between predicted and observed diffraction scores (DS).
Pearson‘s correlation coefficient for predicted and observed DS.
Area under ROC curves as in Figure 4b, with success defined as “better than 10 Å diffraction” (DS > 3) or as “2.8 Å or better diffraction” (DS ≥ 5).