Skip to main content
. Author manuscript; available in PMC: 2011 May 1.
Published in final edited form as: Proteins. 2010 May 1;78(6):1583–1593. doi: 10.1002/prot.22677

Table 1.

Properties used by the predictor, organized by feature class.

Feature value properties
Torsion meta-feature Accuracy IG

Psamp Rosetta sampling rate 88.9%
lowE 10th percentile energy of models with the feature value 76.4% 0.016
minE minimum energy of models with the feature value 87.7% 0.040
frag rate of occurence of the feature value in the fragments 86.2% 0.039
loop indicates either an E or O torsion feature value

Ppred output of nativeness predictor 91.1% 0.081
Secondary structure meta-feature Accuracy IG

Psamp Rosetta sampling rate 87.2%
lowE 10th percentile energy of models with the feature value 72.8% 0.018
minE minimum energy of models with the feature value 86.2% 0.023
psipred secondary structure prediction from Psipred 87.7% 0.034
jufo secondary structure prediction from JUFO 80.9% 0.010

Ppred output of nativeness predictor 91.8% 0.055
Topology meta-feature Accuracy IG

Psamp Rosetta sampling rate 21.4%
lowE 10th percentile energy of models with the feature value 21.4% 0.032
minE minimum energy of models with the feature value 46.4% 0.023
co approximate contact order of a structure with the given topology

Ppred output of nativeness predictor 60.7% 0.036
Register meta-feature Accuracy IG

Psamp Rosetta sampling rate 54.0%
lowE 10th percentile energy of models with the feature value 44.7% 0.065
minE minimum energy of models with the feature value 61.2% 0.057
bulge indicates the presence of at least one beta bulge in the register

Ppred output of nativeness predictor 57.6% 0.066
Contact meta-feature Accuracy IG

Psamp Rosetta sampling rate 85.4%
lowE 10th percentile energy of models with the feature value 68.9% 0.002
edgedist distance (in residue numbers) of a contact from either end of a pairing
oddpleat indicates an anomaly in the pleating pattern

Ppred output of nativeness predictor 88.3% 0.005

A native feature value is correctly identified by a property if the property is higher (or lower, in the case of energy properties) for the native feature value than for any other values of the associated feature. The “Accuracy” column indicates the percentage of features from our benchmark whose native values were correctly identified by each property. Accuracy values have been omitted for properties that are only informative in conjunction with others and so have no predictive value on their own. Ppred, the output of the native feature value predictor, is included here for comparison. Predictors were trained using leave-one-out training on the benchmark set of 28 proteins. Accuracy measures were computed on the left-out protein and averaged across the set. The “IG” column indicates the average information gain for a predictor Ppred based only on Psamp and the indicated property, versus the baseline predictor Psamp, in units of bits per residue—total gain for features in each class for a given protein is divided by the number of residues in the protein. Results are averaged across proteins in our benchmark. Note that information gain can be large even for properties which do not yield accuracy increases if rare native feature values are often substantially enriched. The information gain given for Ppred is the gain when all properties are included in the predictor.