Table 1.
Properties used by the predictor, organized by feature class.
| Feature value properties | |||
|---|---|---|---|
| Torsion meta-feature | Accuracy | IG | |
| Psamp | Rosetta sampling rate | 88.9% | |
| lowE | 10th percentile energy of models with the feature value | 76.4% | 0.016 |
| minE | minimum energy of models with the feature value | 87.7% | 0.040 |
| frag | rate of occurence of the feature value in the fragments | 86.2% | 0.039 |
| loop | indicates either an E or O torsion feature value | ||
| Ppred | output of nativeness predictor | 91.1% | 0.081 |
| Secondary structure meta-feature | Accuracy | IG | |
| Psamp | Rosetta sampling rate | 87.2% | |
| lowE | 10th percentile energy of models with the feature value | 72.8% | 0.018 |
| minE | minimum energy of models with the feature value | 86.2% | 0.023 |
| psipred | secondary structure prediction from Psipred | 87.7% | 0.034 |
| jufo | secondary structure prediction from JUFO | 80.9% | 0.010 |
| Ppred | output of nativeness predictor | 91.8% | 0.055 |
| Topology meta-feature | Accuracy | IG | |
| Psamp | Rosetta sampling rate | 21.4% | |
| lowE | 10th percentile energy of models with the feature value | 21.4% | 0.032 |
| minE | minimum energy of models with the feature value | 46.4% | 0.023 |
| co | approximate contact order of a structure with the given topology | ||
| Ppred | output of nativeness predictor | 60.7% | 0.036 |
| Register meta-feature | Accuracy | IG | |
| Psamp | Rosetta sampling rate | 54.0% | |
| lowE | 10th percentile energy of models with the feature value | 44.7% | 0.065 |
| minE | minimum energy of models with the feature value | 61.2% | 0.057 |
| bulge | indicates the presence of at least one beta bulge in the register | ||
| Ppred | output of nativeness predictor | 57.6% | 0.066 |
| Contact meta-feature | Accuracy | IG | |
| Psamp | Rosetta sampling rate | 85.4% | |
| lowE | 10th percentile energy of models with the feature value | 68.9% | 0.002 |
| edgedist | distance (in residue numbers) of a contact from either end of a pairing | ||
| oddpleat | indicates an anomaly in the pleating pattern | ||
| Ppred | output of nativeness predictor | 88.3% | 0.005 |
A native feature value is correctly identified by a property if the property is higher (or lower, in the case of energy properties) for the native feature value than for any other values of the associated feature. The “Accuracy” column indicates the percentage of features from our benchmark whose native values were correctly identified by each property. Accuracy values have been omitted for properties that are only informative in conjunction with others and so have no predictive value on their own. Ppred, the output of the native feature value predictor, is included here for comparison. Predictors were trained using leave-one-out training on the benchmark set of 28 proteins. Accuracy measures were computed on the left-out protein and averaged across the set. The “IG” column indicates the average information gain for a predictor based only on Psamp and the indicated property, versus the baseline predictor Psamp, in units of bits per residue—total gain for features in each class for a given protein is divided by the number of residues in the protein. Results are averaged across proteins in our benchmark. Note that information gain can be large even for properties which do not yield accuracy increases if rare native feature values are often substantially enriched. The information gain given for Ppred is the gain when all properties are included in the predictor.